How Well Do You Leverage Machine Learning at Scale? Six Questions to Ask

Cognizant AI

Published in

CognizantAI

7 min readJun 23, 2021

By Rajaram Venkataramani, Chief Architect

Moving up the MLOps maturity curve delivers faster, more accurate answers to business problems.

Whether your goal is to improve the customer experience, your operational efficiency or safeguard the public from threats, artificial intelligence and machine learning (AI/ML) can uncover trends and insights that would otherwise be impossible to find. Deploying AI/ML quickly and at scale delivers the fastest and most accurate predictions and prescriptions for business challenges, while allowing you to adapt ML models to most quickly meet changing needs.

Just as DevOps brings new applications to market more quickly by combining the development and operations functions, MLOps encourages collaboration between data scientists and operations professionals to speed the building, testing, deployment, and governance of AI/ML models.

The more mature your MLOps processes, the less time it will take to develop your AI models, the more accurate they will be and the more quickly you can retrain them when — not if — the data on which they depend changes.

Based on our work helping clients improve their MLOps processes, we have developed an MLOps maturity model that describes the requirements to reach each level and the benefits as your MLOps processes improve.

MLOps Maturity Model

Answering the following six questions can help you determine your current MLOps maturity level and what steps to take to improve. For each question you answered “yes” record the number of points for that question from Table 1 below and total your points. Then check Table 2 to see where your total point score lands you on our MLOps Maturity Model.

Can your ML models self-heal?

As new or noisy data is generated by a business, the ML models trained on previous data might become less accurate. Restoring their accuracy and usefulness requires retraining them against the current data sets. However, bringing the models down for manual retraining interrupts the flow of insights for business users or customers and can incur excessive expense and delays for organizations running hundreds or thousands of models. It will also disrupt customer experience. (A “yes” gives you 25 points.)

To improve your self-healing capabilities, begin by monitoring the models in production using commercially available tools such as Fiddler or New Relic. With them you can detect when their accuracy falls below a level that would affect the business. Automating the retraining of the models s to reach acceptable accuracy requires automating the data supply chain with tools such as Pachyderm. You can use commercial and open-source feature stores that abstract the noisy data and present the most relevant variables that affect the model. However, to achieve self-healing you must create custom code to integrate these stores with your models.

2. Have you developed key performance indicators (KPIs) for your production ML models, and do you monitor them against these KPIs?

Data scientists speak the language of Bayes Theorems, decision trees and standard deviations. Your business decision makers talk about sales, profits, cost reduction and refining production schedules and marketing budgets. These are the metrics your MLOps KPIS should track.

Tracking such KPIs reduces the risk you’ll spend too much on the wrong ML models, or even worse that a competitor will beat you to the business benefits of ML. As you prioritize your AI/ML investments, engage your business stakeholders early to not only agree on where to invest, but on how to track and prove the business benefits. This helps assure you are delivering value and builds support for the investment needed to drive greater MLOps maturity. (A “yes” gives you 20 points.)

3. Have you automated the process of explaining how your ML models work?

Describing how the “black box” of ML arrives at its conclusions can be essential to building support for its use, especially in highly regulated industries such as life sciences. Rather than wait four or five days for data scientists to scour logs and other information to explain just one outcome from one model, off the shelf tools such as Algorithmia and Alibi use algorithms to explain how a model generated its findings. You can also create custom lineage diagrams that link model findings back to features and data sets. Adopting a common automated explainability tool across your models can improve communication with business partners and increase compliance while assuring those models are providing the most accurate and relevant answers. (A “yes” gives you 15 points.)

4. Have you automated your model lifecycle management?

The lifecycle of an ML model spans its development, its listing in a registry, its deployment, training, retraining, validation, and monitoring. Without an automated process to manage this lifecycle, organizations can waste money redeveloping models that already exist, and spend as long as six to eight weeks deploying each new model. This cycle can be cut to one week using automated ML model lifecycle management tools using tools such as Algorithmia and Seldon Core. Tools such as ClearML can help manage everything from collaborative experiments to data stores and model deployment. Most clients have already implemented Kubeflow or MLflow to manage the deployment of ML models on containerized infrastructure managed using the Kubernetes framework. In addition, leading hyperscalers such as Google Cloud, Microsoft Azure and AWS also are moving towards providing 360-degree ML model management solutions. (A “yes” gives you 15 points.)

5. Have you automated the management of your data and feature life cycles?

Data is the raw material required to train AI models, so effective AI at scale requires integrating and joining all data, whether online/in real time and offline, with repeatable processes and proper governance to assure its quality, accuracy, and timeliness.

Training ML models on raw data, however, wastes time, money, and effort because the data sets are often too large, change too quickly and are “noisy,” meaning they are either corrupt or in a form the ML model cannot use.

Organizations thus extract selected information from the raw data to create features that can be used by an ML model. For example, in creating a model to underwrite home insurance, the raw data would include individual fields such as its address and attributes such as number of bedrooms or the presence of a garage. A feature would combine these individual attributes into a term from which an ML model could learn, such as “three-bedroom homes with a garage in Zip Code 01907.”

Automating the creation of such features allows them to be easily shared across teams for use in multiple models, can increase model accuracy by 20 percent and cut model development time and run costs to a similar extent.

Tools that generate synthetic data such as YData and Synthetic Data Vault help automate such management by testing the robustness or your management tools and processes. Feature-engineering tools such as Featuretools can help automatically create features from datasets. You can also tap open source feature stores such as Feast and HopsWorks, can speed the creation of features and the associated data pipelines, while Molecula can continuously extract and update features in real time, while Tecton stores and shares and provides other services for enterprise feature stores. (A “yes” gives you 15 points.)

6. Have you created an organization and processes with clear roles and responsibilities to manage the model lifecycle?

Because AI and ML models are comparatively new, many organizations still regard them as experimental and have not yet created groups and processes to manage them. Since ML models are a form of software, you can leverage the skills, workflows, and teams you have created to implement DevOps, continuous delivery, and continuous integration of applications to provide the same quality control and tracking. If you do not yet have these processes for other forms of software such as applications, we suggest implementing them before trying to move to MLOps. (A “yes” gives you 10 points.)

Industrializing Machine Learning

Total your “yes” answers based again on the weighting in Table 1, and find your total score in Table 2 to rank yourself on the MLOps maturity scale.

Table 1: Number of points for a “yes” answer to each question.

Table 2: Total Score Required for Each MLOps Maturity Level.

Just as with any other improvement to your business, raising your MLOps maturity level requires an investment in time, money and effort. However, many of the required tools and processes can be implemented in as little as six to eight weeks. Even better, once they are in place they will reduce your costs and increase your business agility for years to come, and help you compete as your rivals scale their use of AI/ML.

About the Author

Rajaram Venkataramani is Chief Architect — AI, Analytics and Cloud within Cognizant’s AI practice.

How Well Do You Leverage Machine Learning at Scale? Six Questions to Ask

Moving up the MLOps maturity curve delivers faster, more accurate answers to business problems.

MLOps Maturity Model

Written by Cognizant AI