Git under the hood (minimal)
Git is a content-addressable, immutable, distributed database optimized for tracking filesystem snapshots.
Core ideas:
Everything is identified by a cryptographic hash (SHA-1 or SHA-256)
Data is immutable: new content ⇒ new hash ⇒ new object
Commits store complete tree snapshots, not diffs
History forms a Merkle DAG of commits
Main object types:
blob → raw file content
tree → directory (maps filenames to blob/tree hashes)
commit → metadata + pointer to a tree + parent commit(s)
tag → named reference to an object (usually a commit)
Storage layout:
Objects live under
.git/objects/<2-char>/<38-char>Loose objects are zlib-compressed individually
Packfiles group and delta-compress objects for efficiency
Hashes and integrity:
Object id = hash(content)
Commit includes hash(tree) and hash(parent)
Chain of hashes = tamper-evident Merkle DAG
References:
.git/refs/heads/<branch>→ latest commit hash.git/refs/tags/<tag>→ tagged commitHEAD→ current branch ref (symbolic)Detached HEAD → points directly to a commit
Index (staging area):
.git/indexmaps paths → blob hashes + metadataBridge between working directory and next commit
Enables three-way diff: working dir, index, HEAD
Graph model:
Commits form a DAG: node = commit, edge = parent
Merge = commit with multiple parents
Rebase = rewrite DAG by creating new commits
Remotes:
Remote = peer repository (not a master)
Fetch/push sync missing objects by comparing hashes
Transfer is delta-efficient and stateless
Architectural patterns:
Content-addressable storage → object immutability
Composite → trees containing blobs/subtrees
Merkle DAG → commit integrity and verification
Symbolic references → HEAD and branches
Staging buffer → index as write cache
Eventual consistency → decentralized sync
Mental model:
Git = immutable key-value store + DAG of snapshots + symbolic refs.