Incrementally adopt a formatter in a large repo

Alex Eagle
4 min readJan 4, 2018

--

Formatting code is a job for computers. We can make the rules so exact, and have such confidence that we understand the language grammar, that the formatter is safe to automate.

The big advantage of machine-formatting is that it makes us more productive! Some of the productivity comes from skipping a few keystrokes to make the code look good, but the majority is from time saved interacting with your co-workers. When humans try to make an objective decision about something so subjective, we spiral into unproductive discussions during code reviews.

However, we want to avoid a massive disruptive re-format of the whole repository. Such a re-format is a big cost — you pollute your blame layer in version control so that it always takes an extra hop to a previous change to understand authorship. Plus, all the pending pull requests at the time of the change will need an ugly rebase. The same thing can happen later if you upgrade or re-configure the formatter.

While a formatter is nice for making the code uniform to read, in practice you read code from many different sources, or in different languages, and also the formatting choices may have changed over time, so we must accept non-uniformity. Really, uniformity isn’t our objective. We just need our code formatted in some reasonable way, and we trust that it stays that way. If we give up the uniformity requirement, there’s a big advantage: we can introduce or upgrade a formatter *without* messing up the version control history.

All of the above means two things: you should use a formatter, and we should be able to check that our developers are using the formatter *without* checking the entire codebase. That means the formatter should be incremental, at the granularity of changed regions (not whole files).

My project, Angular “ABC” (http://g.co/ng/abc) is all about scalability, so we want a formatting setup that works when our repository gets large. The Angular project takes about 90s to check the format of everything, which is too long for such a common developer task. An incremental formatter is great here too — checking the formatting of a change scales with the size of the change — these rarely increase as quickly as a big repository.

To be incremental, the formatter must work together with the version control system, finding the lines changed by the commit and then formatting only regions of the file.

There are a couple options. The Angular team uses clang-format, a formatter that understands many languages (and was originally written for C++). clang-format has a git-clang-format command that interacts with git, so it can re-format just your modified regions. Prettier is a more widely-adopted formatter, but it doesn’t yet support the changed-regions-only approach I recommend, see https://github.com/prettier/prettier/issues/3555

To set this up with clang-format, follow the instructions at https://github.com/angular/clang-format#checking-formatting to set up automated formatting for modified code regions. This installs a git pre-commit hook for all developers in your repository. I think that this hook is a sufficient enforcement mechanism to make sure changes are formatted, but if you want to have a check in your CI, you’ll have to carefully check only the same modified regions, since each file may have a mix of different formats.

Bazel BUILD files

Bazel has its own formatter, called Buildifier. Since there are fewer BUILD files in a typical project, we currently don’t have a way to incrementally re-format only the changed code — we always re-format everything.

To run Buildifier, first add the com_github_bazelbuild_buildtoolsrepository into your WORKSPACEfile, like the example. To run it, we recommend adding a buildifierscript to your package.json similar to the example. Since we don’t want to assume developers will have a Buildifier binary already on their machine (and also they might have a different version), this script will compile Buildifier and then run it. After it’s compiled the first time, Bazel will cache the binary so we only pay this cost the first time.

On the CI, we enforce formatting by running `buildifier -mode=check` — see the lint job in the example. Note that we run this before we install the node_modules, so that we don’t need to worry about it triggering on BUILD files that come from installed dependencies. (Also, it’s nice that the CI gives quicker feedback without having to install any npm packages.) Currently the example has the buildifier binary included in the docker image —but this is a recipe for version skew issues. We are working to enable Bazel caching on CircleCI so a re-build is free, and we can use the same command as in the package.json file in the example.

--

--