Source Paper: Hidden Technical Debt in Machine Learning Systems

The One Image Every Data Scientist Should Be Aware Of

Berk Orbay
DataBulls
Published in
3 min readJan 26, 2022

--

Welcome to the unpleasant truth of data science, machine learning, artificial intelligence, deep learning or whatever you call it. That sweet task of model building is only a tiny fraction of the workload when you take your model to live.

Complexity quickly increases when you have multiple models for a single problem. Complexity increases even faster when you have multiple models for multiple problems or for multiple customers which you should deploy your models with small changes. This is especially important if you provide Artificial Intelligence (AI) solutions with Software-as-a-Service ( SaaS) model.

Pathway to Devops/MLOps hell is paved with good intentions and turning a blind eye to technical debt. Although there is excellent research on the topic, the widely popular paper Hidden Technical Debt in Machine Learning Systems (2015) submitted in NIPS 2015 by Google engineers covers the gist of it as plainly as possible.

Here is a heads up. Data science is increasingly becoming a software engineering field. As models as becoming more powerful and more general purpose, tasks become more and more plugging them into a system than to play with models and to come up with solutions. You must learn the moving parts as much as new models.

Still, it is not a reason to throw the towel. Start small, build things, blow things up, start over. After several iterations, you will find a good balance between progress and stability. Also, keep in mind that maintaining such systems requires sizable teams with specialists. If you have a small team (or you are on your own), just cover the basics (i.e. I/O standardization, logging, warnings).

Bonus: You can also read Unreasonable Effectiveness of Data again published by Google engineers and scientists, even though some challenges have been overcome in recent years. I keep forgetting the name and links of the papers, so it is also good for me that I wrote this post :)

More…

--

--

Berk Orbay
DataBulls

Current main interests are #OR and #RL. You may reach me at Linkedin.