3 Reasons to Version Control Analytics Code

Kelsey Pericak
Super.com
Published in
3 min readApr 12, 2022
Photo by Brooke Cagle on Unsplash

Before joining Snapcommerce in 2021, my exposure and engagement with version control through peer reviews was relatively limited to data science and automation use cases. I had become oh-so-familiar with sharing SQL and Python scripts over email and through communal file folders. “Streamlined” would not be the adjective to describe this traditional and old-school process. Now, I have come to recognize and appreciate the strength of code reviews by Analysts using tools such as Github.

Version control, supported and driven by collaborative reviewing, is the sharing of code with one or more peers to be read, revised (if required) and approved. It is common practice on our Analytics team, and used when:

  • Data modelling: Building new and updating existing tables in our database
  • Making dashboards: Creating self-serve reporting with visualizations and metrics
  • Programming: Automating scripts (such as web scrapers) and analyzing or preparing data (such as significance testing, feature engineering, exploratory analyses, etc.)

From lessons learned by our team, here are 3 reasons to start version controlling analytics code:

1. High quality

Your code’s structure, logic, performance and output can be reviewed by peers to reach a consistent and high level of quality. There is a well-known saying, “two heads are better than one.” This rings true with version control and collaborative coding. With quality also comes confidence. Analysts can feel more comfortable using their code’s output for decision making, presenting to stakeholders, and replicating similar reports when their work is supported and pre-shared with peers.

Imagine onboarding to a new project and being asked to present your findings within one short week. Without teamwork and knowledge sharing, feelings of uncertainty could arise. By conducting peer reviews, issues like this would be resolved efficiently and reassurance of quality would be given, ultimately improving the final reports’ insights and your recommendations.

2. Historical logs

Most version controlling tools host a central repository of the master or production code online. Navigating to one location to find the most updated code and details committed enables Analysts to onboard and familiarize themselves with the code in a self-serve manner. A shared repository also lets multiple Analysts pull and push updates to one location and co-work on the same projects.

Another core functionality and benefit is the sharing of work (be it the code or the output) with viewers that are interested but not actively contributing from a technical perspective.

3. Ownership

When querying a table or viewing a dashboard, it’s not always obvious who made the asset. Who should you ask to learn more? How do you direct your update requests? By leveraging version control tools, Analysts and Scientists must take ownership of their work, and this quantifiable ownership expands to the peer reviewers as well. Having more than one owner or collaborator is also ideal for project transfers, vacation covering, and bandwidth constraints.

Peer reviewing with version control allows our team to take clearer ownership, produce higher quality work, and collaborate more easily. This practice, due to its many benefits, should become more common amongst data teams.

For more technology blogs, visit the Snapcommerce Medium page.

--

--

Kelsey Pericak
Super.com

Director of Analytics & Data Science | Master of Management in Artificial Intelligence