Git

Rohan Roy
Jan 10
4 min read

Overview

Git is a version control system that manages and tracks changes in projects. Unlike traditional systems, Git stores data as snapshots of the project rather than as a series of file changes. This approach allows Git to perform most operations locally, making it fast and efficient. It ensures data integrity through checksums and primarily adds data, making it difficult to lose information. Git operates with three main states: modified, staged, and committed, corresponding to different stages of the workflow.

Git ensures data integrity through checksumming, using SHA-1 hashes to track file contents. This makes it impossible to alter files without Git detecting it. Git primarily adds data, making actions nearly undoable, preserving committed snapshots securely. Files in Git exist in three states: modified, staged, and committed. The workflow involves modifying files, staging changes, and committing them to the Git directory, ensuring that all changes are tracked and recoverable.

"Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it", as mentioned in the official website. Now, let's break it down:

1. Content-Addressable Filesystem:

At its core, Git is a content-addressable filesystem, which means that every piece of data stored in Git is referenced by a unique identifier, derived from the content itself. This identifier is a SHA-1 hash (though newer versions of Git are moving towards SHA-256 for better security).
In Git, everything is stored as objects (blobs, trees, commits, and tags), and each object is identified by its hash. This makes Git extremely reliable for tracking changes because any modification in content results in a different hash, ensuring that each version of the content is uniquely identified.

2. Version Control System (VCS) Interface:

On top of this content-addressable filesystem, Git provides a user interface that is designed to handle version control tasks. These tasks include creating branches, merging changes, committing updates, and more.
The VCS functionality is built around the concept of tracking changes to files and directories over time. Git's interface allows users to navigate through different versions (or snapshots) of the project, manage branches, collaborate with others, and maintain a history of changes.

Why This Matters:

Efficiency: Because Git references content by hash, it only needs to store each unique piece of content once. This makes Git very efficient in terms of storage.
Integrity: The use of cryptographic hashes ensures the integrity of the data. Any corruption or accidental change in data can be immediately detected.
Flexibility: The underlying structure allows Git to provide powerful branching and merging capabilities, which are essential for modern software development practices like feature branching and collaborative development.

In summary, Git’s design as a content-addressable filesystem with a VCS interface allows it to efficiently manage and track changes in a reliable and flexible way, making it one of the most popular version control systems in use today.

Git has three states, that the files can reside in:
- Modified: means that the file has changed but it has not been committed to the database yet.
- Staged: means that one has marked a modified file in its current version to go into the next commit snapshot.
- Committed: means that the data is safely stored in the local database.
Accordingly Git projects have three main sections:
- Working Tree: It's a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk to use or modify.
- Staging Area: The staging area is a file, generally contained in the Git directory, that stores information about what will go into the next commit. Its technical name in Git parlance is the “index”, but the phrase “staging area” works just as well.
- Git Directory: The Git directory is where Git stores the metadata and object database for the project. This is the most important part of Git, and it is what is copied when one clones a repository from another computer.

Settings

Git uses git config to set configuration variables that control its behavior. These variables can be stored in three locations: the system-wide file (/etc/gitconfig), the user-specific file (~/.gitconfig), and the repository-specific file (.git/config). Each level overrides the previous one, with repository-specific settings taking precedence. The --system, --global, and --local options determine which file is read or written to, with --local being the default for repository-specific configurations.

You can view all your Git settings and their sources using the command:

git config --list --show-origin

This command lists all configuration variables, showing where each one is set, whether it's at the system, global, or local level.

After installing Git, the first step is to set your user name and email, as these are included in every commit you make. Use:

git config --global user.name "John Doe" 
git config --global user.email johndoe@example.com

This sets the information globally, but you can override it for specific projects by omitting the --global option.

By default, Git names the initial branch "master" when creating a new repository. However, starting with Git version 2.28, you can set a different name for the initial branch. To set "main" as the default branch name, use:

git config --global init.defaultBranch main

This command ensures that all new repositories you create will use "main" as the default branch name instead of "master."

Git was initially designed as a toolkit for version control, featuring "plumbing" commands that perform low-level operations and can be combined in UNIX-style scripts. These commands are foundational but not user-friendly. In contrast, "porcelain" commands are higher-level, designed to be more intuitive for users. This distinction highlights Git's flexibility for both advanced users and those seeking simplicity.

When you run git init in a directory, Git creates a .git directory, where it stores and manages nearly all of its data. This directory contains everything necessary to back up or clone the repository. Understanding the contents and structure of this directory is crucial for working with Git, as it holds the repository's metadata and object data. The files that are usually created once the command is run are as follows:

config description HEAD hooks/ info/ objects/ refs/

Index folder gets created after the first staging of any file is done.

In Git, the .git directory contains four key components:

HEAD: Points to the current branch.
Index file: Stores the staging area information.
Objects directory: Contains all the content for your repository's database.
Refs directory: Holds pointers to commit objects (e.g., branches, tags, remotes).

These elements are core to Git's functionality, managing the content and structure of your repository.

Git

Overview

1. Content-Addressable Filesystem:

2. Version Control System (VCS) Interface:

Why This Matters:

Recent Posts

Comments