How can MLOps solve the problem of Production in Machine Learning ?

Safoine EL KHABICH
3 min readDec 1, 2021

--

MLOps combine machine learning, applications development and IT operations. Source: Neal Analytics

In the first post we have discussed the meaning of MLOps term and i have shared an article of NewVantage-Partners-Survey to answer the question why do we need MLOps as a discipline !

The summary of the Survey is that “91.5% of companies reporting ongoing investment in AI. Only 14.6% have deployed AI into production” but we didn’t answer the question.

📌 How can MLOps solve the problem of Productionalization (deploying ml into real world-environment) ? :

In order to answer our main question, we need first to understand why majority of ml system fails to reach deployment. Let’s try point the 4 biggest Obstacles in ml system lifecycle and how MLOps is solving it :

1️⃣ Experimenting & Tracking: ML lifecycle contains a lot of experimenting from data to model architectures to hyper-parameters. Keeping track of all those experiments well organized and simplified is vital to draw a valid comparisons and conclusions to get best performing version.

2️⃣ Automation: Machine learning is lacking automation, almost everything is done manually from data collecting, data processing to exporting models. The history of automation shows that humans are the least valuable doing repetitive tasks “if it is not automated, it’s broken.”

3️⃣ ML packaging: ML systems include a high variant of packages and libraries to take the models from a system to another can be long and stressful because of compatibility problems. packaging ml helps on sharing, contributing and easy deployment across different environments.

4️⃣ Monitoring and Observability: we need to continuously monitor the system to ensure it’s operating effectively. Models in production can break due to data drifts, the delta between changes in the data from the last time the model training occurred, or concept drift.

📌 Now that we went through the main obstacles in ml system lifecycle let’s see how MLOps is trying to solve each of this problems :

1️⃣💡 Experimenting & Tracking: Having a tool that can track and store all different parameters while saving result of each experiments and provide a comparison can push the cycle of improving the model so fast.

⚡️ Tools example: MLFlow, DVC, Weights & Biases, Neptune, Pachyderm

2️⃣💡 Automation: Adopting DevOps automation practices (CI/CD for example) in machine learning systems will accelerate the whole lifecycle and reduce failures. if we for example applied continuous training so whenever the data changes or new data is available the process of re-training model triggered automatically then after it’s done this models are saved automatically into artifacts store we would save not only time in repetitive task but also avoid errors.

⚡️ Tools example: CML, GitHub Actions, Jenkins

3️⃣💡 ML packaging: Packaging machine learning means getting everything related to our model, from libraries to code and environment settings into a container (containerization). If you don’t know already containers can be ported to different hardware or operating system. A training model can be developed on a local machine and be easily ported to external clusters with additional resources such as GPUs or CPUs. and be deployed into cloud or external cluster.

⚡️ Tools example: Docker (the most famous one)

4️⃣💡 Monitoring and Observability: Monitoring are the process and techniques that allow us to keep track of model performance and reliability while it’s on real life environment (production) this can be done through logging, with this tracking we can detect if model is start giving false result or errors that can happen because of system changes or data drift / concept drift.

Note : Monitoring ML doesn’t necessarily means models in production, we can monitor model input/output distribution, training and re-training, evaluation and testing, hardware metrics.

⚡️ Tools example: Grafana + Prometheus, Arize, Fiddler, Seldon Core

⚠️ ⚠️ While listing Tools Examples, I avoided mentioning Cloud Providers platforms. All 3 Leaders Cloud Providers (AWS,Azure,GCP) provide a full MLOps Platforms that solves the problems we have mentioned.

  • AWS provides “Amazon Machine Learning and SageMaker”
  • Azure provides “Azure Machine Learning and ML Studio”
  • GCP provide “Google AI Platform Google Cloud AutoML”

--

--

Safoine EL KHABICH

Machine Learning Engineer Intern at Beewant & M2 Data Science Students