Git has a reputation for being confusing. Users stumble over terminology and phrasing that misguides their expectations. This is most apparent in commands that “rewrite history” such as git cherry-pick or git rebase. In my experience, the root cause of this confusion is an interpretation of commits as diffs that can be shuffled around. However, commits are snapshots, not diffs!

I believe that Git becomes understandable if we peel back the curtain and look at how Git stores your repository data. After we investigate this model, we’ll explore how this new perspective helps us understand commands like git cherry-pick and git rebase.

If you want to go really deep, you should read the Git Internals chapter of the Pro Git book.

I’ll be using the git/git repository checked out at v2.29.2 as an example. Follow along with my command-line examples for extra practice.

Object IDs are hashes

The most important part to know about Git objects is that Git references each by its object ID (OID for short), providing a unique name for the object. We will use the git rev-parse <ref> command to discover these OIDs. Each object is essentially a plain-text file and we can examine its contents using the git cat-file -p <oid> command.

You might also be used to seeing OIDs given as a shorter hex string. This string is given as something long enough that only one object in the repository has an OID that matches that abbreviation. If we request the type of an object using an abbreviated OID that is too short, then we will see the list of OIDs that match

$ git cat-file -t e0c03
error: short SHA1 e0c03 is ambiguous
hint: The candidates are:
hint: e0c03f27484 commit 2016-10-26 - contrib/buildsystems: ignore irrelevant files in Generators/
hint: e0c03653e72 tree
hint: e0c03c3eecc blob
fatal: Not a valid object name e0c03

What are these types: blob, tree, and commit? Let’s start at the bottom and work our way up.

Blobs are file contents

At the bottom of the object model, blobs contain file contents. To discover the OID for a file at your current revision, run git rev-parse HEAD:<path>. Then, use git cat-file -p <oid> to find its contents.

$ git rev-parse HEAD:README.md
eb8115e6b04814f0c37146bbe3dbc35f3e8992e0

$ git cat-file -p eb8115e6b04814f0c37146bbe3dbc35f3e8992e0 | head -n 8
[![Build status](<https://github.com/git/git/workflows/CI/PR/badge.png>)](<https://github.com/git/git/actions?query=branch%3Amaster+event%3Apush>)

Git - fast, scalable, distributed revision control system
=========================================================

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

If I edit the README.md file on my disk, then git status notices that the file has a recent modified time and hashes the contents. If the contents don’t match the current OID at HEAD:README.md, then git status reports the file as “modified on disk.” In this way, we can see if the file contents in the current working directory match the expected contents at HEAD.

Trees are directory listings

Note that blobs contain file contents, but not the file names! The names come from Git’s representation of directories: trees. A tree is an ordered list of path entries, paired with object types, file modes, and the OID for the object at that path. Subdirectories are also represented as trees, so trees can point to other trees!

We will use diagrams to visualize how these objects are related. We use boxes for blobs and triangles for trees.

$ git rev-parse HEAD^{tree}
75130889f941eceb57c6ceb95c6f28dfc83b609c

$ git cat-file -p 75130889f941eceb57c6ceb95c6f28dfc83b609c  | head -n 15
100644 blob c2f5fe385af1bbc161f6c010bdcf0048ab6671ed    .cirrus.yml
100644 blob c592dda681fecfaa6bf64fb3f539eafaf4123ed8    .clang-format
100644 blob f9d819623d832113014dd5d5366e8ee44ac9666a    .editorconfig
100644 blob b08a1416d86012134f823fe51443f498f4911909    .gitattributes
040000 tree fbe854556a4ae3d5897e7b92a3eb8636bb08f031    .github
100644 blob 6232d339247fae5fdaeffed77ae0bbe4176ab2de    .gitignore
100644 blob cbeebdab7a5e2c6afec338c3534930f569c90f63    .gitmodules
100644 blob bde7aba756ea74c3af562874ab5c81a829e43c83    .mailmap
100644 blob 05f3e3f8d79117c1d32bf5e433d0fd49de93125c    .travis.yml
100644 blob 5ba86d68459e61f87dae1332c7f2402860b4280c    .tsan-suppressions
100644 blob fc4645d5c08bd005238fc72cfa709495d8722e6a    CODE_OF_CONDUCT.md
100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42    COPYING
040000 tree a58410edddbdd133cca6b3322bebe4fb37be93fa    Documentation
100755 blob ca6ccb49866c595c80718d167e40cfad1ee7f376    GIT-VERSION-GEN
100644 blob 9ba33e6a141a3906eb707dd11d1af4b0f8191a55    INSTALL

Trees provide names for each sub-item. Trees also include information such as Unix file permissions, object type (blob or tree), and OIDs for each entry. We cut the output to the top 15 entries, but we can use grep to discover that this tree has a README.md entry that points to our earlier blob OID.