Visualize git diff

With History flow

TL;DR

This viz (visualization) is output of performing git diff across all the commits of a file since the beginning of the time. The viz technique used here is History flow by IBM Collaborative User Experience Research Group.

history flow is a tool for visualizing dynamic, evolving documents and the interactions of multiple collaborating authors.

You can find the examples here, concept of History flow here, and code here.


TS;WM

Git is used as version control system across all the major projects around the world. It has been there all the time witnessing the project evolved. It tracks every single file of the project , remembers all lines of code removed, updated and added. It knows all the things that one could possibly do with a project.

Git has plenty of powerful commands which lets you do all the right things you ever wanted to do. For example git status is probably the most used command in any circumstances. What follows git status is the monster git diff. Often times use of git diff remains at the surface level where the current change is compared against the previous change. But like I said, its a monster, this little command can be used to retrieve your entire file history with ownership information. When all these versions are placed side by side, it’s more like a tale that narrates what happened to a file. And if there exists a file(s), which represent the whole project, then this command narrates story for the whole project.

For example package.json in any node / javascript project, can be deemed as representative file of the project. It has field for version, author information, dependencies, scripts; almost everything which defines a project at global level. On the other hand a project takes iterations and time to evolve or become stable. If package.json is the representative file of the project then this should also reflect the process of evolution. Hence this viz retrieves all the version of the file and charts in front of us.

Essentially git diff is flow of changes over time when its performed over the lifetime of a file. While I was thinking what kind of viz could be meaningful to address this use case, I came across History flow by IBM Collaborative User Experience Research Group.

history flow is a tool for visualizing dynamic, evolving documents and the interactions of multiple collaborating authors.

Its a perfect match. So in a nutshell git diff is applied on a particular file since the beginning of the time and the result is dumped as an input to render the viz. No rocket science there. Please read the concept of History flow before proceeding. Its super easy, intuitive and important.

This how package.json of React looks like when the commits spaced by timestamp.


Each commit is represented by thin colored lines. Two consecutive commits are connected to represent the changes from one commit to another commit. Now a particular commit consists of many contribution hunks. You can think hunk as chunk of code added or updated in a commit. A color is assigned to a contributor. Hence a hunk in one commit line gets color property from the contributor who is responsible for creating the hunk. Hence a commit line becomes multi-colored.

This ugly example has only 5 commits. The first commit is obviously the creation of the file by a single contributor. In the second commit another guy came and made some changes. Its not the same guy (contributor) because a new color has appeared from the second commit. Now there is also flow between commits. You can think of flow as position mapping of chunk of content from previous version to current version. The transparent area between flows the gap. A convergent gap indicates deletion of content. A divergent gap indicates addition of content.

The below picture shows the commits in D3 package.json file spaced by time.

The thin faded grey vertical lines are extension of commit lines and are for reference. Every commit gets a line like this. The commit lines are of various length based on the size of the file, but the length of extension line along with a commit line is always same across all the commit instance. Because of this, extension lines together shows the distribution of the commits over time in a better way. This distribution is meaningful only when the commits are spaced by timestamp. It can be inferred from the picture that the maintenance of the project(commit distribution) was decently distributed apart form few exception (towards the right less clustering of the reference lines).


The current implementation supports spacing of commits in equal distance (ordinal axis) or by time. Community View and Latest Commit are the two modes supported in each choice of spacing. Community view mode shows all the users contributions as an overview by applying same level of color encoding. But in Latest commit mode the latest hunks in the commits are highlighted by fading the rest of the hunks.

Below space describes all these combinations to package.json for NPM repo.

Equally spaced commit with community view. All the commits are spaced at equal distance ignoring the timestamp. This particular instance conveys some commit pattern for the file. Like, upto some extent only one committer (blue color) was predominantly contributing. Post almost half of the total commits other contributors started to make active contribution. Another observation, there are only few colors which are visible conveying that only handful of people are allowed to maintain the project. See the interactive version here.

Equally spaced commit with latest commit view. Only the latest hunks in a commit are vibrantly colored while the other older hunks fades away. This kind of gives the overview of how big the change was. This instance of the example conveys major rewrite of dependency happened thrice (3 large blue lines) at the initial stage of the project.

Time spaced commit with community view. As mentioned previously, the commits are placed based on timestamp. This view gives an overview on how frequently the file / repo is maintained or updated. In this particular example the commits are made at almost regular interval. The high clustering of grey the lines denotes the significant numbers of commits for a unit time.

Time spaced commit with latest commit view. The latest commits are spaced by the time the commits are done in this view. One interesting observations from this view when package.json is the subject file is the release cycle. Look the top of the image, the small dots of blue and red forming a line actually tells us the version update of the project. For NPM the release is very consistent.


This viz does not tells anything about a file / repo for certain. For example if a file gets less / no change over time, one might infer that the it has not got much contributions and at the same time other might infer that the module is stable and requires no changes. Its all contextual. Again since the beginning I have assumed that package.json is the representative file for a node project, which does not have to be true all the time. This viz itself has few assumptions, which if not met, might convey wrong information. Like it viz uses pallet of 24 colors. If the number of contributors exceeds more than 24, already assigned colors are allotted for a new user. Hence from a viz perspective you wont be able to distinguish more than 24 contributors.


You can browse all of the examples here. You can browse the code here. Let me know if you want to know about the code that powers the viz.