How to get Data Science to truly work in production

Max Leander
Analytics Vidhya
Published in
3 min readApr 19, 2020

This is the first article in a series on best practices when building and designing machine learning systems. Read the next article in the series here: https://medium.com/@max.y.leander/create-reproducible-machine-learning-experiments-using-sacred-f8176ea3d42d

Lately, there has been a boom in tools trying to solve problems that arise when engineering entire machine learning systems. This is understandable, if we believe a study cited by Andrew Ng in one of the editions of his great newsletter The Batch. According to the study, 55% of companies have yet to successfully deploy a machine learning model to production. So what can be done to increase the success rate?

Every Data Science project goes through a number of stages, from analyzing data to gathering insights, through building ML models and pipelines to deploying these as parts of a complete software system. I tend to view the need for best practices in Data Science as a Maslow’s hierarchy of needs, where each stage of the project requires an increased focus on additional quality attributes.

According to this hierarchy, it is clear that reproducibility is the most fundamental quality of any Data Science project, and should be present right from the start.

One of the main problems when putting machine learning in production is to understand when and why a model is behaving in unexpected ways. It can be difficult to debug a ML system. Software Engineers use best practices in their work. Tried and tested methods to prevent, pinpoint and handle errors when they arise. But these methods are just not as applicable to software which wasn’t manually crafted, but automatically generated through a training process, involving tons of data and machine learning algorithms.

In order to understand why a model is behaving unexpectedly, it is paramount to know how it came into being. A data science project typically involves dozens of different data sources and experiments involving numerous types of models, transformations and hyper parameters. The human choices that went into training the model finally being selected for production, can typically be guiding in understanding the choices that the model is subsequently making in a live setting. In other words, in order to understand the behaviour of a model, we need to track the experiments that led to its existence.

You could say that the tracked experiments are to the Data Scientist what Git commits are to the traditional Software Engineer. Just as a software developer will need to investigate the commit history to understand a software bug, the ML practitioner will need to investigate the experiment history to understand the production model, especially when it’s behaving in funny ways. If reliability is the most fundamental quality to the Software Engineer, reproducibility is the most fundamental to the Data Scientist.

In order to ensure reproducibility, we mainly need to do the following three things:

  • Control randomness​
  • Keep track of data sources and configuration
  • Use isolated & reproducible environments​

In the next post, I will show how to use Sacred, an open-source framework for tracking experiments, which can be utilized to accomplish the first two points here.

Stay tuned!

--

--