Prioritizing Code Review: Why Research Teams should be Bug-phobic, too

Moran Brody
Riskified Tech

--

Code review is a common and embedded practice in the programming world, backed up with horror stories online as a necessary practice for well-developed code. Unfortunately, in the research domain, there is much less focus from the community on the advantages of implementing a code review process.

In this blog post, I’ll share with you my view on research code review — why it is so important and how you can properly implement it in your team’s research work. This is based on my personal experience as a research team leader in the past two years, transitioning from a small non-reviewing research team to a large team where code review is an integral part of every research task.

Why code review?

Despite its value, the main reason both researchers and programmers try to avoid code review is that they would rather be writing code than spending time reviewing old code. Before diving into specific research-related reasons for code review, I want to start by explaining the general advantages of code review and how it ultimately saves rather than wastes time.

  • Detecting and fixing mistakes. First and foremost, code review helps avoid mistakes that may end up generating completely different results. This is extremely relevant for new employees, but let’s face it, mistakes are common and can happen to everyone. The basis of your code review process should be: (a) verifying that the data pulled from your database is what you expected and (b) checking through all the data wrangling steps, including handling cases like nulls and duplications. Research code needs testing and review, just like any application or software.
  • Staying consistent. Automated tools (such as lintr) can help deal with surface-level code aesthetics, but code reviews are needed to build consistency in code across the team. If for some reason you don’t have a code style, read this great post of why code consistency is important and find a style guide online to get started (we use the Tidyverse Style Guide).
  • Sharing knowledge in day-to-day work. Yes, you probably have some kind of knowledge sharing platform, whether it’s a blog or just weekly team meetings. But, I believe knowledge is best shared–not through meetings and readings–but through day-to-day work. Code review promotes learning horizontally across the team, and more importantly, improves the final research products. Having an additional team member wade through the bits and bytes of the project can help raise important questions around methodologies and results.

To sum up, code review helps save time by effectively finding and fixing mistakes while research is still ongoing, rather than weeks or months after the fact. By reducing the time spent digging into past research projects, code review promises neater and more consistent results.

Why is code review important for research?

By now I hope you are already convinced of the many advantages code review has to offer, but I want to give you some extra reasons for its importance in the research world:

  • Coding is a key part of the research process. Research is versatile — it involves developing hypotheses, identifying the best methodologies, and doing a lot of literature review. But, implementing all of the above and exploring your data is mostly done with coding. I think it is fair to say coding takes 50–60% of researchers’ time. So, validating your team’s code means making sure a key element of their work is done properly.
  • Long research time cycles. Research has long life cycles. Full and thorough research can take weeks or even months. Imagine getting the most magnificent results after months of hard work only to find a basic mistake which dramatically changes the results. This can be better avoided with code review throughout the research process. You don’t have to wait for the research to be fully complete; code review can and should happen even in the early stages of a research project.
  • Research is the basis for decisions. From deciding how to improve customer satisfaction to deciding which new products and offerings the company should pursue, research answers questions that have an immediate and financial effect on the company. Thus, you always want to make sure your results are validated before they are shared with other departments for further development and implementation.

If research results play a key role in your company’s decision-making process and if coding is an essential part of your research work, code review is probably one of the best ways to ensure your team produces good research products that support your company’s growth.

Riskified’s code review story

We started the research code review process here at Riskified about 2 years ago, around the time I started my work as a team leader. In the past two years, Riskified has grown tremendously as a company, and the research team specifically has more than doubled. Our team supported the company’s transition from manual review to a fully automated landscape by training hundreds of models and creating new KPIs for monitoring and decision making. Thanks to our code review ethics, we’ve successfully been able to avoid faulty research products from degrading our production performance.

Our internal code review process also changed over time. We started off by having code review done by team leaders only. However, today, every team member with over 8-months experience takes part in code reviewing their peers’ work. Along the way, we also decided to only share our research products after finalizing code review, which substantially reduced the number of times we had to change numbers and estimations post-communication. Today, we make sure every research task is code reviewed, mainly while the research is ongoing.

Code review played an essential role in supporting our growing research department. It helped us ease the onboarding of new researchers while improving our products and analyses. Code review also enabled us to quickly identify cases where similar logic and code is being utilized across tasks. We were able to then formalize these cases as functions that could be shared as packages across the whole team. In fact, the majority of our “riskiverse” internal R packages originated from our code review process.

Here are a few specific practices from our code review process:

  1. We follow a formal style guide. Since we mainly work with R, we decided to follow the tidyverse style guide.
  2. We have a monthly code review committee meeting, in which we set guidelines and decide on best code practices. For example, we review and approve the use of new external code libraries and solve cases in which the writer and the reviewer couldn’t reach an agreement. For production code, it is especially important to ensure that we only use packages whose licenses allow for their use.
  3. Our code review process includes actually running the code and making sure results are valid. Just browsing over the code is not enough to effectively find bugs.
  4. Each code has at least 2 rounds of review. The first review is followed by fixes and an additional review to make sure all is in place. There could be some additional back-and-forths but we try to limit our review to a maximum of 4 iterations.
  5. Every task is reviewed and every team member takes part in the process. For new reviewers, we have several weeks of review mentoring by a senior reviewer.

Our code review process continues to evolve, but is certainly here to stay. Every time we catch a mistake or refine an analysis, we’re setting the foundation for better products and research projects down the road. And most importantly, everyone in the team is continually learning and honing their skills by engaging with one another.

Closing

Thanks to its many advantages, code review is an important part of our research day-to-day work in Riskified. I hope this blog post was able to convince you to make it a key factor in your research work as well.

--

--