Git log spelunker

(A proposal)

Below is a graph of git activity on a fairly large project. Does it look intimidating to you? It does to me, despite using git every day for years.

The problem is that these graphs just show everything. Traversing this graph is like using Google Maps at maximum zoom to plan a trip across Europe. There is no concept of zooming in or out; no distinction between the highways and the dirt tracks.

Instead, I want to pass a set of ‘interesting’ commits to a program and have it show me their topological relationship. Do they point to the same commit? Is one an ancestor commit of the other? Have the branches diverged? If they have diverged, where did they diverge? Given two different commits, there are only a few options:

On the left, the two commits have diverged from a common ancestor. In the center, one commit builds on top of another. On the right, the commits are totally unrelated (a more unusual situation).

These are 10,000-foot views of your repository. A cloud stands for a set of commits. For example, in the center image, the ancestors of commit 84f include commit ae3 via some unknown commit cloud. It could be a single path or many. It could be via 1 commit or 1,000. The commits in the set could be internally tangled, or in a clean straight line. To begin with, I don’t care about such details — I just care about the high-level topology of the graph.

Let’s be a little more precise. We draw an edge from set A to set B iff there exists a commit aA and a commit bB such that b is a parent of a. This means that there are more ways that the central diagram could have been drawn:

The diagram on the left is the most linear; the diagram on the right is more tangled. We get an idea of the important tangles without seeing all the tangles.

Now I have my 10,000-foot view, I want to zoom. I want to zoom in by exposing gradually more structure, or zoom out by hiding details away. This corresponds to adding or removing commits from the set of ‘interesting commits’ which the program is showing us.

There are multiple ways we can expand this set. Let’s say I start with two diverged commits, as in the original diagram on the left. I now want to see where they diverged, which means adding their common merge-base to the diagram:

(Technically, there can be multiple merge-bases. In this example there is just one.)

Another way to zoom is to expose the ‘cloud lining’: the subset of commits at the edges of a cloud. These are the commits that have parents or children in other parts of the diagram. Now we expose the cloud lining between 84f and f33:

Another way to zoom is to expose neighboring commits. Each vertex in the graph has a ‘neighborhood’: the set of commits that are parents or children of commits at that vertex. Let’s expose the neighborhood of f33.

At this point I want to know how large those clouds are. Let’s turn on the commit count:

There are 25 commits in the cloud between ae3 and 372. With other commit viewers, we are forced to look at all 25 commits, but they are often of no interest to us. I think the information-hiding above makes it much easier to see how the graph is structured. (Making the cloud size logarithmic in the number of commits is a nice way to give an impression of the number of commits without being precise about it.)

Enough zooming in — let’s zoom out. One way to do this is to select two adjacent vertices (or select the arrow between them) and collapse the two vertices into one cloud. Let’s select the cloud of 2 commits and the commit 4ee, and collapse them:

Instead of collapsing pairs of vertices, we can select big clumps of vertices and collapse them all at once. Let’s do that:

There is one restriction on what we are allowed to collapse: the resulting graph must be a DAG. For example, we cannot collapse the 25 cloud with the 18,634 cloud, because it would become an ancestor of itself via the 6 cloud:

But we could include the 6 cloud in the selection and collapse all three:

This process of zooming in and out gives us an effective way to navigate through an extremely large graph, in the same way that zooming in Google Maps allows us to travel between villages in different continents without seeing every T-junction along the way.

What’s more, since this tool only exploits the DAG structure of git commits, the same tool would be useful for navigating all other DAG structures. Complex chains of program dependencies are another. Dependency relationships in task management are another. The entire git repository, commits, trees, and blobs, also forms a DAG — what would this tool look like if we included trees and blobs too? Would it help us solve problems? (‘What is the path from this commit to this blob?’)

What do you think? Would a tool like this help you in your day-to-day work with git? If so, I’d be interested in collaborating on it with you!

(Addendum: sorry for the mixed metaphors of ‘spelunking’ and ‘flying’. I couldn’t decide which was better.)