Stacked Diffs: Keeping Phabricator Diffs Small

In my previous post, I argued that Large Diffs Are Hurting Your Ability To Ship. Now I’m going to show you how to keep your diffs small using Phabricator.

Don’t use Phabricator? This workflow has been ported to Github in Stacked Pull Requests: Keeping Github Diffs Small.

Here’s the basic strategy:

  1. Break up a large change into a series of individual commits
  2. Stack the commits on top of one another.
  3. Create Phabricator Diffs for each commit.

Let’s walk through a real world example.

1: Set the “default-relative-commit”

The first step is to set your default-relative-commit to HEAD^(see the appendix below for why we need to do this). Let’s go to our project’s root directory and set our local default-relative-commit by echoing HEAD^ to the “.git/arc/default-relative-commit” file:

2: Create the First Commit

Time to make the first bit of our change. Starting on master, we’ll checkout a branch for the new feature, add a new class to our project, make our first commit, and sent it out for review:

If we look at our git log we can now see the new branch with the new commit:

And if we checkout Phabricator, we’ll see our diff is up for review:

3: Commits 2 Through N

We continue coding, making an additional logical change. For the sake of this example, let’s say the class uses functionality in the NewClass.java file we added in the first commit:

Here’s what our git log looks like now:

Running `arc diff` now sends up our second commit (and only our second commit) to differential:

We now have two commits in differential, that can both be reviewed independently. When your reviewers are looking at Part 1, they can focus solely on the changes introduced by NewClass.java. When your reviewers are looking at Part 2, they can just focus on the additional functionality that is added by the DependentClass.java file. The benefits here are the same you get with abstraction, the reviewer has to hold fewer things in their head.

4: Address Feedback

Now let’s say a reviewer caught a mistake in your Part 1 diff and we need to go back and fix it.

Let’s do this using an interactive rebase. While we’re still on the my_feature branch we’ll run the following:

Doing this will drop us into a text editor where we can opt to edit commits between our current commit and the master commit. Since we want to edit the first commit, we’ll follow the instructions given by git and switch the pick associated with the first commit to edit. We’ll then save the file.

If you’re not familiar with an interactive rebase here’s essentially what’s happening. Git rewinds all of your progress, uncommiting all the changes you’ve made. Then it plays back each of those changes in the order you specify in the file above. If any of the commits are marked with edit, git will stop at the commit and let you make changes. In fact, this is exactly what git tells you once you start this rebase:

And if we take a look at our git log now, we’ll see that our HEAD is actually pointing to this commit:

So following git’s instructions, let’s make our fix, amend, upload the fixes to Differential, and continue the rebase:

Once our rebase is finished, we’ll be right back where we started, at the tip of the new_feature branch:

And we see that our diff is now waiting to get re-reviewed:

5: Land the Diff

Alright, Part 1 has been approved and is ready to land:

Before landing, let’s update our master and rebase our my_feature branch onto master (see the appendix below for why this is important). Once we’ve rebased, we’ll then land the commit by passing it’s SHA-1 hash to the `arc land` command.

Your diff has now landed! Our git log now looks like this:

You’ll notice that the Part 1 commit seems to appear twice. Let’s clean this up by rebasing our my_feature branch onto the new master. The new master should include the commit we just landed, so after the rebase we’ll see that our my_feature branch actually only include the Part 2 commit. This is pretty nice. After each land and then rebase, our branch will now only include the commits that haven’t yet been landed.

After doing this, we’ll see that our git log is nice and tidy:

6: Rinse and Repeat

If we’d had more than just two diffs in our stack we would just keep repeating the same processes in steps 3 and 4. Once all but the last diff in a stack has been landed, we can just run `arc land my_feature` to land the last commit.

Provisos, Pro-tips, and Parting Notes

  • I was pretty explicit and verbose with a lot of the commands above for the purposes of explaining exactly what was going on. As you get more comfortable with the work flow, I highly recommend setting up scripts and git aliases to help you do all of this. For example, since this work flow involves often doing an interactive rebase starting from master, I’ve aliased `git rebase -i master` to simply `git rim`.
  • If you’re using Harbormaster and Jenkins for CI, you’ll want to setup a staging area, other wise your stacked diffs will fail your integration tests (see appendix for how to do this).
  • If you’re ever unsure about what code is going to be sent up to Differential when you run `arc diff`, just run `arc diff -preview`.
  • If you ever screw up a rebase, git reflog is your salvation.
  • If someone has set the default-relative-commit in the .arcconfig file for your repo, you’re going to need to get rid of that config line. Otherwise, the change you made to .git/arc/default-relative-commit will be overridden every time you run arc diff.
  • Sometimes you’ll amend during a rebase and that will cause a conflict further up in the stack when you continue the rebase. When resolving these conflicts be sure to only use `git rebase -continue`. Don’t do an amend when resolving these conflicts. I’ve made that mistake before. It’s no fun.
  • When doing an interactive rebase, only edit one commit at a time. Doing more than one is an invitation for forget where you are in the stack and screw something up. I’ve made this mistake before. It’s no fun.
  • Always finish a rebase. If you walk away from your computer in the middle of a rebase, you may come back to it later, forget you’re in a rebase, and then try to change branches. This can seriously screw up what your repository looks like and confuse the hell out of you. I’ve made this mistake before. It’s no fun.

Special Note

I want to give a special thanks to all of the mentors at Facebook who taught me this workflow and to all the folks at Uber who have helped me refine it. Special thanks to Grayson Koonce for his massive help with editing this post.


Kurtis Nusbaum is a Mobile/Backend Developer at Uber and an avid long distance runner. He can be found on Facebook, Twitter, and LinkedIn. Here’s his github. If Kurtis seems like the kind of guy with whom you’d like to work, shoot him an e-mail at kcommiter@gmail.com.


Appendix

Default-Relative-Commit

The default-relative-commit (DRC) is what arc uses as the base for determining what to actually send up to Differential. Many people set this to origin/master instead of HEAD^. This causes a problem in the workflow we use above. If our DRC was set to origin/master, then in the example above when we arc diff on Part 2, the first commit would be included in what we send up to Differential. We set it to HEAD^ to in order to ensure that we only ever have one commit per diff that we send up.

Rebasing before landing

When landing an individual commit in your stack, there’s a chance there might be a conflict when landing. In order to avoid any conflict issues, we update master and rebase our branch to ensure a clean land.

Setting up a staging area

  1. find your repository in diffusion, e.g. https://code.mycomany.com/diffusion/GRREF/
  2. click “edit repository”
  3. copy the “remote uri” value
  4. edit the “staging area”, and paste in the remote uri value
  5. save!