How to only lint files a git pull request modifies.

Joey Gracey
3 min readJan 23, 2018

--

I’ve been dealing with some poorly written python code at the moment. The code works well, but it doesn’t follow a styleguide so readability isn’t great. It’s moderately big so refactoring would be tedious and even with autopep8, there are certain lint issues that aren’t readily solved with automation. Besides, we’re busy and can’t afford to spend time refactoring a big codebase.

Sound familiar? I’ve lost track of the number of projects I’ve joined where a styleguide hasn’t been implemented simply because of the high up-front costs in adopting it in more mature projects. Surely there must be a compromise where we can adopt a styleguide gradually and with grace?

Typically, code linters will lint the entire project and report back any violations across any file. This very safely accomplishes catching any error in a compliant project, but makes gradual adoption very hard. It’s 100% buy-in and you can’t get the rewards until you’ve finished the process.

What if instead we state “all new code should adopt these rules”. For each pull request, lint only the files that contain changes. Don’t bother linting files that this PR has left alone, because it’s not a regression and PRs need to have discipline in the tasks they complete. Over time, however, we might expect to see much of the codebase transformed, with maybe a little task at the end to clean up odds and ends. We can transform a big-upfront cost into a much more manageable process and de-risk the project significantly by gradually adopting a styleguide instead of Big Refactor Upfront™.

Limitations to the approach:

Most of software development practises boundary on the file and this one will be no different. If one part of a file is modified, we will lint the whole thing. This allows us to catch unused variables and other deeper linting inspections with greater ease. The choice is somewhat pragmatic: linting at the module level or the block level usually require more language semantics but could be advantageous in some circumstances.

Git Tricks to Know:

git merge-head <current> <parent>: return list of git commits on the current branch but not on parent. This gives us the list of commits that make up the pull request.

git diff —-name-only --diff-filter=d [<ref>...]return the files affected by the commit(s). --diff-filter=d filters out deleted files: all other modifications count as fair game: you touch it, you fix it.

Putting it Together:

With these two commands, it’s easy to get a list of files a given branch modifies by pumping the refs that git merge-head outputs directly into our git diff command

git diff --name-only --diff-filter=b $(git merge-base HEAD $BRANCH)

The beautiful thing about a shell is the interoperability between programs. Most, if not all, shell-based linters accept a list of files as valid input so it’s sufficient to pass the output directly into eslint, flake8 , and an innumerable number of other linters:

DIFF=$(git diff --name-only --diff-filter=b $(git merge-base HEAD $BRANCH))
flake8 $DIFF

In Action:

I wrote git-diff-lint to compartmentalize this logic in a portable bash script. It’s agnostic to the linter and branch names and works nicely in a Jenkins linting task or as a pre-push hook. Check it out on GitHub to get the details on how it works. As always, PRs are welcome and appreciated.

--

--