The Risk in Machine Learning

Published in

Data Folks Indonesia

2 min readApr 5, 2023

In this article, We will discuss the potential risk of developing machine learning model and how MLOps is a part of solution to mitigate the issues.

Practical MLOps by Valohai introduce three main types of risks that MLOps workflow can handle:

Loss of Knowledge

The problem arises when a group of data scientist working on a machine learning project and a few of them have to suddenly disappear from a project before the project done which we called “bus factor” in software development.
However, machine learning signifies the problem compare to software development. Although the code may self-documenting. But, machine learning involves not only code but also data. From what sources the dataset from, what kind of data transformation and etc. That makes another confusion that you may unable to backtrack.
You can read, doing a bunch of testing to understand the logic to understand the code. However, the need of machine learning to run is to have the dataset. To generate dataset, you may collect from multiple sources, doing some cleaning and preprocessing, feature engineering that created multiple scripts and notebooks and eventually lost which steps should be taken first.
A proper documentation is not only needed when you are collaborating with other data scientist. this problem also appears when you have to maintain (e.g retrain / finetune) the model that has been developed like six-months ago. Without proper documentation and versioning, you may run the existing script and still afraid if there is anything breaking.

Machine learning application such as recommendation system, face detection, speech recognition require real-time inference. Developing including improving existing model needs an automated way of doing the model updates. Shipping broken code / model to production is heavily avoided, even sometimes there is a moment that pause all the development and focus on the deployment. MLOps helps to prevent failures in production by using CI/CD which includes test code, code quality, load testing to ensure that unexpected behavior don’t happen in the production.

ML application can be applied in many ways and in many industries. In some cases the solution may be not that serious if something went wrong such as friends recommendation, similar items in wholesale, etc. But, in some industry the application of machine learning is highly regulated and have a strict practice on how to implement that such as in finance and medicine. If the model has unknown bias that the data scientist aren’t aware of, this could lead to the loss of trust and reputation of the company. Hence, the role of MLOps is to make sure that everything is reproduce and back track to how the model was trained and automatically run the report of trained model on model interpretability.