Insanely Simple Pull Requests

Code Reviews in Git: Commit by Commit

This is a repost from a blog entry I did internally at Amazon way back when. I had just put out a PR for a large feature in one of our front-end internal tools. Since it was a sizable code delta, I took the opportunity to expound on one of my favorite topics: reviewing pull requests commit by commit. I had gotten the sense from some folks that optimizing for this review style wasn’t common practice which then motivated the original post. I’m sharing this style here in the hopes that it might be useful to some folks in the wider dev community.

So what is reviewing commit by commit? It is having the faith that the PR author structured their commits intelligently. It is reading through said commits one at a time to understand the overall PR and intent of the change. It is essentially a golden moment in a dev’s life when they get to review a well structured pull request. Each commit makes sense in isolation, and strung together, the PR happens smoothly and feels much like reading a good book.

Have a look at this rather large PR (295 commits) from a former colleague of mine that spans from December 6, 2015 to February 17, 2016.

You’ll see that the review spans 148 files and 3343 line deltas. That’s a pretty large change, and I imagine reviewing it sounds like a terrifying proposition to many. You can see it now. Weeks of long nights alone in front of the dim glow of your faithful machine, you trace and weave your way through an unending mess of tangled logic and broken promises. Commits aren’t structured well, yet the overall diff is a sea of green and red, inviting you to your doom.

Fortunately you’re safe this time around and the above PR won’t result in a loss of sanity. Quite the opposite, in fact. Hopefully you too find it a stellar example of great communication thru code! Commenting on this PR, of course, is slightly more laborious as comments are now per commit, but trust me when I say the streamlined review is well worth the effort.

Pros: Quick Reviews

Typically the advantages of reviewing CbC come from an increased signal to noise ratio. When done right, individual commits are the smallest, semantically coherent units of code change. You might have to open many tabs while reviewing, but each tab can be reviewed with incredible speed.

When weighing the pros and cons of CbC consider the following.

Consider code additions. Typically you add some method/module/class/data, you link to it, you use it in a few call sites, and voila, you have new functionality! Git commits enable us to capture all of this at a very fine grain. Taken separately, these individual commits can be reviewed incredibly quickly as the definition and use is together in the same window pane.

Consider refactored code. If you want to change the indentation of a method like so, this can add noise when also adding logic to the same area of code. When reviewing CbC, the indentation changes and code additions are separate, allowing you to see through the noise and cut straight to the important changes when it comes time to review the logic/design.

Consider commits that are intended to have no logical changes like this whitespace change or this simple renaming refactor. These commits take almost no time at all to review and can be checked off almost as fast as you can scroll.

The pros are many, and the speed at which you can review an entire change list should not be understated. The original PR of mine that inspired this post touched 27 files and there were very few non-trivial changes. My coworker at the time and I were able to go through the entire feature PR in about 30–40 minutes with plenty of time spent on healthy discussion!

Cons: It Takes Some Work

Clearly it’s easy to merge all commits into one gigantic feature. It takes work to tease apart semantically coherent chunks of code. If you want to send out a good PR using CbC, you’ll want to actively use a few best practices while developing.

Fortunately, git provides you with many power tools to assist with this, and you’ll find regularly staging your commits becomes second nature over time.

Tip #1 — Use `git mv`

Nothing breaks my heart more than a renamed file that also has changes in it. In a PR, it looks like a huge RED change and a huge GREEN change. Your reviewer will have to look at the entire file just to figure out what changed. Prefer breaking up the change into multiple commits. The first commit does the move, and the second commit can then perform whatever changes you want.

Tip #2 — Use `git add -p`

Nobody codes perfect, semantically coherent commits in order. We typically write our libraries in step with our calling code, or we add multiple distinct methods at the same time to the same file. Typically we’re used to coding whatever, whenever, and staging commits is an afterthought. When it finally does come time to stage changes for a commit, it then becomes difficult to tease apart proper commits using the granularity of just files alone. This is where git add -p comes in. It interactively asks you which parts of your files you would like to stage, allowing you to stage changes at a very fine grain. It’s fast, it’s effective, it’s wonderful.

Tip #3 — Use `git rebase -i HEAD~N`

Sometimes you might commit multiple commits, only to find that you wrote a bug 4 commits back. In most cases, you can commit the bug fix and do an interactive rebase to squash (merge) the bug fix commit into the commit that introduced said bug. This effectively removes the bug from your commit history! This is a necessity when supporting CbC, as it takes strain off any reviewer that might otherwise get snagged on that bug when reviewing. It’s always a bummer to read something, get confused, identify a bug, and write a snarky comment, only to find out the bug was fixed in a later commit!

Note: You can also use interactive rebasing to edit any commit, change any commit message, etc… With great power comes great responsibility though. DON’T edit the commit history of anything that’s on the remote! This breaks other people’s histories and makes for painful merges.

Also note: git rebase … requires a clean git repo. You can use the power of git stash to temporarily save your in-progress changes, perform the rebase, and then git stash pop to get your changes back.

Tip #4 — Use descriptive commit messages

A descriptive commit message can communicate the relative importance or non-importance of a commit’s contents. It also conveys intent and captures the semantics of the commit. As you can see in the PR, there aren’t too many surprises when comparing the commit messages to the commit content. This only speeds up the review process. Commit messages should start with a capital, not end in punctuation, and be 50 characters or fewer.

Tip #5 — Plan your moves and indentation refactors

The command git add -p won’t be able to save you from indenting a whole file and then making changes to a part of it. If you know you’re going to do large scale formatting changes, plan accordingly.

In Summary

That’s about it! These are the majority of practices that I abide by, and it really takes no time for these to become habit. There are also some other benefits when developing this way.

  • Done well, most commits are buildable/runnable. This allows for quickly finding bugs via manually checking out commits or thru git bisect.
  • These are open source best practices and interacting with the FOSS community becomes a lot easier.
  • Post code review, looking at a commit history can be very insightful when trying to identify why something changed and in what context it did so. Great commit histories can make great paper trails.

Thanks for reading! Please don’t hesitate to let me know if you have any favorite git strategies or patterns. And finally, h/t and thanks to Daniel Gasienica for being the inspiration for this post.