The One with the Introduction to MLOps

Ashmi Banerjee
2 min readJun 12, 2022

A brief overview of Machine Learning Operations (MLOps)

Photo by Crystal Kwok on Unsplash

Making the leap from a proof-of-concept to deployment is a hurdle that many machine learning (ML) models struggle to take.

While most industrial ML projects target to develop highly performant and scalable ML systems in production, it is often difficult to automate and operationalise these systems so that they work perfectly.

This issue is addressed by the paradigm of Machine Learning Operations (MLOps).

Source: NealAnalytics

What is MLOps and why do we need it?

MLOps is a set of best engineering practices in Machine Learning (ML), Software Engineering (DevOps) and Data engineering that aim to build efficient and reliable systems in production.

The application of these practices not only improves the quality and automates the deployment of Machine Learning and Deep Learning models in large-scale production environments but also makes it easier to align them with business needs, and regulatory requirements.

MLOps serves as a guideline for individuals, small teams, and even businesses to build efficient ML systems while optimising resources and costs.

Key Phases of MLOps

Source: MLOps Principles (ml-ops.org)

The key phases of an MLOps process can be broadly divided into Design (requirements engineering, use-cases prioritisation etc.), Model development (data engineering, model development, model selection, validation, and testing) and Operations (model deployment, CI/CD pipelines, model monitoring, and retraining).

Conclusion

MLOps is increasingly gaining popularity in industries that are trying to build robust and scalable systems for their ML projects.

However, there is a huge skill shortage when it comes to data engineers, DevOps engineers etc. as the MLOps curriculum is not yet integrated with the traditional Data Science education. The role, responsibilities and skillset of an MLOps engineer have been summarised here.

Moreover, deploying and maintaining such systems efficiently is not just expensive and require a lot of acceleration resources (e.g. GPU, CPU, TPUs etc.) but also pose a lot of operational challenges.

The references and further readings on this topic have been summarised here.

If you like the article, please subscribe to get my latest ones.
To get in touch, either reach out to me on
LinkedIn or via ashmibanerjee.com.

--

--

Ashmi Banerjee

👩‍💻 Woman in tech, excited about new technical challenges. You can read more about me at: https://ashmibanerjee.com/