Git is a Directed Acyclic Graph and What the Heck Does That Mean?

The Git version-control system frustrates me because, while it is an excellent productivity tool that lets scores of developers collaborate on a single code base without clobbering each other, it asks its users to understand how it works at its deepest levels. It’s like if you were going to write a letter and first had to read a treatise on UTF-8 encoding. I want to make Git less mysterious.

A core concept — once you can visualize it, you can troubleshoot and recover from those weird Git situations you inevitably find yourself in — is the way that commits relate to each other. And the wildly accurate and incredibly unhelpful description of that way is “a directed acyclic graph.” (Although, when I sat down to write this, my dashboard contained a link to this cheerful deep-dive with illustrations: Spinning Around In Cycles With Directed Acyclic Graphs.)

There’s a much simpler way to understand this.

Whom do you follow?

I’ve read about the custom in small towns in Spain for waiting in line. Or rather, waiting in not-line. You walk into a shop and ask, “Who’s last?” Then, rather than standing in a constrained spot on the floor to remember your position in the queue, you merely need to remember “I follow her” and, temporarily, “I’m last.” When the next person enters the shop, you’ll answer his query to indicate that you were last. He’ll remember that he follows you, and now you’re not last anymore.

This is just like the way a Git commit history is structured. Each commit remembers which commit came before it. (A merge commit points to two previous commits.)

Think of commits like people waiting in a shop, each pointing to the person he or she follows.

So wait, what is a commit then? It’s a snapshot of the state of your files, plus some metadata such as who made the commit and when, with a comment, and a pointer to the previous commit(s). That pointer is part of what makes a unique commit. This is why actions that change the parent, such as rebase and cherry-pick, create a new commit with a new ID. This is why re-writing history with those actions on commits you’ve already pushed to a shared repository can cause grief for your collaborators and is a stern no-no. If I’ve built on top of commit A, such that my commit B includes a pointer to A as its parent, and you rebase and replace A with toothpaste, my B is orphaned, pointing to a parent that is no longer part of the history.

The drawings are wrong

This epiphany struck me like a lightning bolt. The way people explain Git and the way people take notes about that explanation don’t match up.

People who use words like “directed graph” draw commits with each arrow pointing to the preceding commit.

Time marches from left to right; arrows point to the left.

People learning about Git, especially those with a background in an older version control system, draw commits with each arrow pointing forward in time.

Arrows advance to the next point in time; arrows point to the right.

It’s a subtle point, but it means the difference between anticipating what will happen when you merge or rebase, and not. Those history-changing actions are changing where the arrows point; it doesn’t really matter what position the dots are in. Because the dots are not waiting in line by standing in a spot on the floor; they’re relaxing in a quaint shop, remembering which dot precedes them.

The correct mental model is to think of the arrows as identifying a commit’s parent: arrows pointing to the left.

Merges and rebases are just updating pointers.

A rebase updates pointers.

The more natural but unhelpful way to imagine the arrows is depicting the flow of time, but that means you have to imagine a rebase as violating the laws of physics.

Changing the flow of time, itself?

So think of those arrows as pointers to parents, instead.

That said, there is still a shortcut I’m taking in the above drawings, and it is important to call that out to keep from leading you astray. Recall that a commit includes information about its parent or parents. Therefore, changing a parent really means creating a new commit. It will get a new ID. And anything that uses the old ID as its parent will be cast adrift unless it, too, gets a new ID reflecting its amended lineage.

That’s why you’ll find stern warnings not to rewrite shared history. You would be replacing commits someone else might have built upon.

HEAD is a label

The people-waiting-in-a-shop metaphor also helps you think about HEAD and what happens when you check out a branch.

Branches are drawn as if they were a garden path of stepping stones, leading off to adventure, but what a branch really is is a label on a particular commit. That commit has a pointer to its parent, which has a pointer to its parent, and so on until you get back to a commit that is the parent of two lineages, yours and the one that can be traced back from a label called “master”.

Calling one branch “master” is just a convention, as is calling the remote repo you stuck on GitHub “origin”. The names don’t mean anything to Git, only to the people using it.

A commit can have lots of labels, for instance, its SHA1 hash identifier, master, the branch you just created, and HEAD. Branch names and HEAD scoot along automatically as you add commits.

When you check out a branch (git checkout myfeature) , or any commit (git checkout a30bef), Git makes the files in your working directory look like they did when that commit was made, and it makes the HEAD label point at that commit.

Checking out a branch is like passing a HEAD label to someone else.

“Detached head,” that rather scary-sounding state you get into when you check out a commit by ID, is just saying “HEAD points at a commit that only has an ID and no other labels.” If you dig into your repository’s .git folder, you can find its HEAD text file. When you have a branch checked out, that text file contains a reference to the branch, e.g., ref: refs/heads/master. When you check out some arbitrary commit by ID, the text file contains that ID, and that signals Git to tell you it is in a detached head state. Checking out a named branch again gets things back to normal.

Pointers, not the arrow of time

The pictures to come away with are:

  • Arrows point to the preceding commit, not the subsequent one.
  • Waiting in a shop, you need to know only whom you follow and whether or not you’re the most recent to arrive.
  • Branches are labels on commits.
  • Checking out a branch passes the HEAD label to that commit.

Do these images resonate? Are there parts that are still confusing?