MLOps: Market Map & Thesis

6 min readMay 19, 2022

Machine learning operations (or MLOps), is the development, management, and deployment of Machine Learning models. Simply put, it encompasses everything around the black box called “Machine Learning Algorithm”.

Every Machine Learning driven project would typically include 3 components: Data, Model, and Code; all of which are closely intertwined and need to be managed efficiently using well defined structures, processes, and tools. This is due to the iterative and incremental nature of ML application. Achieving these tasks also requires collaboration among data engineers, data scientists, devOps engineers, QA engineers, and data analysts.

Production grade ML application requires many pieces that need to fit together coherently, data pipelines (collection and analysis), repeated experimentation, deployment strategy, model lifecycle management, and model observability.

“MLOps is to AI/ML what DevOps was to Cloud” — an accelerant for widespread AI adoption

Just as DevOps empowered developers to develop cloud-scale applications at cloud-velocity, MLOps would empower engineers and scientists to build AI-driven applications at unprecedented scale and velocity. The idea is to provide the basic underlying infrastructure to increase the quality, quantity, and velocity of machine learning applications by eliminating the inefficient, complex, and hard-to-manage in-house solutions, manual processes, or custom scripts.

Before, putting forward my thesis, sharing a crisp infographic to understand how to think about the ML/AI space, thanks to A16Z.

Opportunities in MLOps

Developing and serving machine learning applications have their own unique challenges, rendering generic software engineering tools ineffective. Hence, this space is one of the fastest growing enterprise software segments. In fact, over the last couple of years, it is hitting the peak hype (Gartner).

As I continue following the ML/AI space closely, following are the major verticals where I see immense opportunities:

A. Monitoring & Observability: Machine learning models are often referred to as “black boxes”, and in some ways, rightly so. Data Scientists usually spend weeks (or even months) and thousands of dollars to build their version of the “perfect” model (Algorithmia). However, even after such considerable effort, their models may be inaccurate, slow, or worse inexplicable. This may be due to a variety of issues like diverging training and testing datasets, inconsistent data pipelines for transformation, or model overfitting.

More commonly, though, with exabytes of data generated daily, the need for online training has escalated. Hence, while a model might work seamlessly for a few weeks post deployment, it might showcase erratic results soon after, unless the model is re-trained with new data. (Read: data and concept drift)

Finally, the complex data pipelines, non-deterministic nature of machine learning algorithms, and untraceable algorithmic data transformations, make is very hard to debug and track model performance.

Monitoring & observability startups: Arize, Fiddler, Arthur, Aporia, Truera, WhyLabs, Gantry, Superwise, Censius, Credo, Armilla, Robust Intelligence, CalypsoAI

B. Reusable AI (Feature Stores & ML marketplaces): I vividly remember the weeks I spent working on machine learning problems, whether in college, Kaggle competitions, or at Nutanix, most of my time was spent on picking the right features, creating new ones, or cleaning the existing ones. ML/AI model development is an art that is centered around feature engineering. A good model relies on extracting, transforming, and serving features efficiently. This is where a good feature store comes into the picture. In addition, a good feature store improves data lineage, promotes reusability of feature pipelines, and monitors the health of feature pipeline. In fact, the most sophisticated ML teams in the Valley (Google, Uber, Netflix , etc) have been using some forms of internal feature stores for years now. Uber’s Michelangelo platform is one of the earliest, robust, and most popular feature store.

Taking the same concept even further, there is also considerable value in marketplaces that provide pre-trained models. Now, instead of reusing features, scientists and engineers are attempting to reuse entire models using an approach called transfer learning. This allows for a model that is trained on large datasets to be repurposed for a more specific use case by using only a small amount of data.

Reusable AI startups: Tecton AI, Feast, Hopsworks, FeatureForm, Splice Machine, HuggingFace, ModelZoo, Clarifai, BigML, OpenAI

C. Integration Tools: As enterprises scale from dozens of models to hundreds or even thousands of models, we expect the increased industrialization of AI. This will result in the rise of orchestration layers that sit on top of machine learning pipelines to help manage all the different tooling. These layers will integrate across multiple tools in the ML pipeline — whether open source, commercial or homegrown — and provide an integrated environment (or so-called “single pane of glass”) to better track and manage ML pipelines.

Deployment: While ML engineers may build sophisticated models, they don’t necessarily understand the complex CI/CD pipelines or Kubernetes environments (Read: Kubeflow). Enterprise infrastructure has become a large beast with complex platforms that is too hard to tame. It takes considerable effort to efficiently provision and manage virtual machines, containers, or functions across multiple cloud providers. Hence, ML engineers and data scientists often rely on DevOps engineers to deploy their models in production systems (which is never simple!). In fact, the time it takes to take a model from development to deployment is only getting longer: 2021 saw longer average times for model deployment than 2020. Hence, there is a gap in the market for platforms (and ML application frameworks) that can abstract away the deployment complexity and let the data scientists focus on data science.
Model Versioning & Experiment Tracking: Another issue unique to model development as opposed to software development is the “indeterministic” nature of results. As you tune the hyperparameters of the model or make architectural modifications, the model outcomes change drastically. Data scientists spend considerable amount of time fine-tuning their models, using multiple datasets (training, testing, validation, etc.), and building customized data pipelines. This necessitates the need for a robust versioning and experimentation platform (akin to a “git” for Machine Learning) to track and iterate through multiple versions of a model.

Integration startups: DataRobot, Kubeflow, Coiled, Seldon, BentoML, Cortex, IterativeAI, ML Flow, Comet, ClearML, Weights and Biases

D. Low-Code & No-Code ML: A major battle that the technology industry is fighting over the past decade is the growing talent shortage and skill gap in software engineering. Now, machine learning adoption is facing similar headwinds. Hence, there is an imminent need for tools that focus on reducing this exacerbating ML/AI skills shortage and gap. The goal for such tools should be to lower the bar for ML application development while ensuring ease of use. Further, these tools should also help in bringing semi-skilled engineers (wrt to ML standards) like software developers, data engineers, and data analysts into the ML market. While this is not directly in the realm of MLOps, I insist on having a presence here since such tools can abstract away the complexity of MLOps.

Low-code/No-code startups: MyDataModels, Intersect Labs, Akkio, Obviously.ai, Levity, Nanonets, Clarifai, Lobe, Runway, MonkeyLearn, Teachable Machine

I feel that ML/AI and more specifically, MLOps, is one of the most exciting spaces in enterprise SaaS to invest in. If you are ideating in this space, building your own venture, or have thoughts about about ML/AI or this article, I’d love to hear from you!

Rachit Kansal

Disclaimer: The opinions expressed in this blog are solely those of the writer and not of this platform. The writer is not a member of or associated with any of the firms mentioned in the blog. The blog does not represent any of my prior or current workplaces.

MLOps: Market Map & Thesis

Opportunities in MLOps

Written by Rachit Kansal