Common problems taking ML from lab to production

A look into disappearing data and degraded performance preventing ML models from shipping

Originally posted on Kaskada’s “Machine Learning Insights” blog here

Photo by Franki Chamaki on Unsplash

Surprising but true: 80% of your models never make it out of the lab and into production, and when they do, they more often than not become stale and hard to update. Today, we’ll cover two common problems you might have hit recently: disappearing data and degraded performance. They’re so common that it doesn’t matter the size of your company, whether you work for a “tech” (or “non-tech”) company, or how many teams of people are dedicated to shipping your product.

Disappearing data

Data is constantly changing. Our data warehouses, data lakes, streaming data sources, etc. are constantly growing. New features in the product create new telemetry; you’ve bought a new data source to supplement a new model; an existing database has just gone through a migration; someone accidentally began initializing a counter at 0 instead of 1 in the last deploy… the list could go on. What could possibly go wrong?

Any one of the above changes brings about challenges in ML. First, let’s tackle data availability between your online and offline data sources. You’ve made it all the way through feature engineering, model training, and cross-validation, iterated several times, and you’re finally ready to productionize your model. It turns out, however, that the data you had access to during training and validation is somehow different than what’s available in production.

Now what? More often than not, your model is shelved, sometimes for months or even forever. It may take multiple sprints to add another production feature to be available at the moment of prediction. One way to solve these challenges is to use a development environment that supports read-only connections to your production data pipeline directly instead of pulling from offline sources. This, however, may move your problem up, because you’ll need to make the business case for adding new data to the pipeline before you can experiment on whether or not it’s worth the data engineering effort to do so.

Another possible solution is to build the data pipeline yourself, but you’re likely to make rookie mistakes as data engineering is challenging and isn’t your specialty. A third and emerging solution is to develop and register your features to a feature store. There are different approaches to feature stores, some approaches allow for offloading the computation and productionizing of APIs for you, these feature stores can help solve this particular challenge. Allowing for experimental model iteration locally and serving of APIs to your data engineering team that can be used directly in production.

Degraded performance

Ok, you’ve solved the disappearing data problem, but now that the model is live, you aren’t saving as much money as you projected with the update. What happened? Oftentimes, this means that something has changed about parameters of the data that your model relies on, such as the average, minimum or maximum value expected by the model.

To verify, you could take a new validation set from a smaller, more recent time window to see if your results are different than expected. That was it — somehow the price of toilet paper has skyrocketed! But, how do you ensure this degradation doesn’t happen in the future? One way is to normalize your data so that you’re training and predicting on normalized data. However, it can be tricky to find the right value to normalize with and might not show the relationship to your target that you once had.

Another solution is to use event-based data that has your training set separated by time from your validation set. Keep a held out set of your data to validate the accuracy of the model against the most recent events. This can catch the problem early so that you know if the shape of your data is changing faster than you expect.

An additional step is saving the parameters that your model depends on with the version of the API you’ve released and writing monitoring tests against new values using something like Great Expectations. For instance, maybe you’re looking for 90% of new data to be within a certain range of your parameters. This will allow you to know when performance is likely to have degraded below what is expected and it’s time to retrain.

These are only two of several common problems you might encounter when trying to take your ML model out of the lab and into production. Is there a problem you keep running into that you’d like to talk about or one you’d like help solving? Send me a note and we’ll get into it!




We bring interesting and technical content to the world of AI with sugar on top.

Recommended from Medium

Captcha Recognition

Figure 2: The designs of two deep learning models Arch-A and Arch-B to break general captcha images.

First Order Motion Model for Image Animations

Applying Principal Component Analysis

A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks

Neural Network- Handwriting Recognition Technology

Confusion Matrix and Cyber crime

Loss Minimization Interpretation of Logistic Regression

Fundamental Terms of a Neural Network

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Charna Parkey, Ph.D.

Charna Parkey, Ph.D.

Data science lead @kaskadainc. Startup leader. Applied scientist. Engineer. Language & culture speaker. EE-PhD in DSP. Formerly #6 @textio.

More from Medium

Best Practices In ML Observability for Click-Through Rate Models

ML Platform User Experience: Research to Production

ML Latency No More

Machine Learning is hard. Make it easier with Aero