Code Reviewing Data Science Work

Shanif Dhanani
3 min readJan 8, 2018

--

Code Reviews

“No man is an island.”

This is a pretty famous saying that I’ve picked up and internalized recently (and to be fair, this should probably say “no person is an island”).

It basically means that nothing we do is done in isolation, and certainly nothing that we accomplish is accomplished in isolation.

I’ve found this to be extremely true in my life, and I can guarantee it’s true for anyone that has accomplished anything of note.

There are always others helping you through the trails and tribulations of daily life.

The life of a tech team is no different. All technical work of any consequence that I’ve ever worked on has been done within a team setting, usually formalized by an agile development process like Scrum or Kanban.

A crucial part of this process is the code review.

Code reviews are a practice that I first learned in startup life and that I’ve now fully come to appreciate, imparting it into our startup’s culture.

For those that are unfamiliar, the idea of a code review is for all members of a particular tech team to review any new code before it gets pushed into the trunk or master branch of a code repository.

The goal of code reviews is to ensure that any new code that’s being integrated into a repository is:

  • Free of bugs as much as possible
  • Secure
  • Follows established coding conventions, formatting, and syntax
  • Optimized
  • Clear/concise/legible/maintainable

Doing code reviews is a great way to minimize future problems with a codebase (of which there will be many, regardless of who you are).

Code Reviews in Data Science

Code reviews are important for any team, but they’re traditionally done in the context of a software development team that’s building out a new product or feature.

However, doing them as part of a data science team really isn’t that much different. Most data science teams, in practice, may not be developing heavy duty new algorithms, but they probably are writing code to gather and store data, pre-process data, create new models, evaluate new models, and deploy production systems.

Each of these are possible points of failure or can introduce inefficiencies and bugs, so it’s useful to have reviews for this code.

In general, we, like many startups, utilize GitHub’s “pull request” functionality. That, combined with pre-merge hooks that enforce all of our automated checks must pass and we must have no outstanding requests for change, ensure that the code that gets pushed into master has passed through at least some sanity checks.

When we do code reviews for our team, we focus on code syntax and formatting, logic, and efficiency, but we’re always on the lookout for any ways to improve our code.

We also make sure to not attack individuals. It’s all too easy for code reviews to turn into code criticisms, and that’s one thing that you really want to avoid. It’s too easy for these things to turn contentious, so it’s important to reinforce the goal of having better, more maintainable code for future developers (including our future selves), rather than having a critiquing approach.

Generally, we take the approach that reviewers need to be able to provide concrete recommendations for how to fix a particular piece of code that they think needs fixing, either by providing actual code samples or enough detail that the original developer can easily make changes. We always assume that the developer is making their best efforts and we take the approach of assisting and guiding, rather than criticizing.

Overall, code reviews ensure good code, they allow new developers to get up to speed with what’s happening, and they allow existing developers to be held accountable to the standards that they’ve set for the whole team. There’s nothing like a new guy telling you to make a string into a constant after pointing to the coding standard that you wrote saying that very same thing!

--

--

Shanif Dhanani

Creating software for businesses that want to use their data with AI. Learn more at https://www.locusive.com.