What It can Learn from the Industrialization of Supermarkets
Relying on Github or similar methods for code and data repos alone with Tensorflow, PyTorch or similar frameworks is not sufficient to scale AI projects and deployments. Demand more from your AI platform vendor. Demand MLOps.
Supermarkets transformed the food supply chain. MLOps is the supermarket supply chain equivalent for AI. Without MLOps the latent promise of AI cannot be realized. This article, targeted for business executives -the CIOs, the CEO, the CFOs and the likes- first summarizes the risk of deploying AI without an MLOps platform. Later the article describes how a modern MLOps platform can minimize that risk.
In a matter of about a hundred years the food distribution and supply chain matured from a corner store, a butcher shop or at best a street market opened for limited hours carrying locally produced food, to a national/international supply chain of perishable foods available near 24x7 within driving distance from most neighborhoods.
Until the early 1900’s even in the industrialized world people shopped for the perishable food may be daily from the corner store or from their street market. As cool as it may sound now, it meant one could only eat what was available locally or what can be brought in within half a day’s drive. Food would be brought to local street market to be sold the same day or next at best without any refrigeration. The food varied by the seasons in the region and hence the choices of food items in extreme weather regions could be very limited. Bad or stale food was sold and consumed often. If it made somebody sick there would be little to no traceability as to who the vendor was, or where the food was produced. Some of the world’s major pandemics originated from such a food supply chain.
Then around the early 1900’s the supermarkets model of food supply chain and sales was invented. In this model food could be shipped hundreds or thousands of miles, often in refrigerated containers and delivered to supermarkets around the country through major distributors. Later, with the advent of computers and cash registers beginning the 1940’s, every food item purchased by a consumer had a bar code for the item’s identification plus its lineage, with full traceability — when and where it was produced, processed, packaged, shipped, and sold. Every food item is now neatly cataloged, organized and arranged in the store like the books in a library. So first of all, the consumers can quickly find the items they need and self-serve in the supermarket all the way to the checkout. Secondly, if and when there is a massive food recall the FDA has a very efficient way to pull all the bad lots of food from supermarket shelves around the country. Even more so, the supermarket has expiration dates on most fresh foods and it pulls them off the shelf the night of the expiration date, like clockwork, proactively.
So, what has this to do with AI? Good question. AI is akin to producing new kinds of dishes with new kinds of foods in restaurants and homes every day. And then serving them. Really! Let us look at the food ingredients as the AI equivalent of datasets, often coming nightly. They are combined with recipes for possibly inspecting and pre-processing the food and then having another recipe for may be cooking in the kitchen for the final dish. The dish is then ultimately served! These recipes are the AI models. While there are thousands or more recipes, only a handful of the recipes are consumed each day in every culture, with slight twists door to door. In AI too only a handful of algorithms are actually used for a variety of problems, with slight twists.
So, when an AI dish (prediction) starts to fail in the field (in front of the end consumer) one has to retrace back and debug quickly. That means to figure out where the datasets came from, when and who pre-processed it, packaged it, and why pre-arranged expiration and renewal criteria were not already set to pull old, failing models and datasets proactively.
One may ask that big data analytics or traditional machine learning did not have these challenges. That is because true AI is not just big data analytics on steroids. Big data analytics focuses on pattern analysis using rules based statistical methods to detect credit card or bank fraud. That for the most part has already been solved. One can read the code behind the data analytics and can explain how it arrived at the decision. The coupling between the code and the data, where the new data arriving every day requires the code to be changed is not that strong if at all. Hence the code does not go through experimentation and revision constantly. Also, one does not have to spend a lot of time looking at the dataset row by row, visualize it through clustering plots daily, if at all, which is the first task that can be automated with AI (hold on to that thought). This is where the traditional analytics and AI differ. And ironically, even for the old-fashioned data analytics and fraud detection advanced AI models are being are now being considered for even higher accuracy of predictions.
AI is the application of deep neural networks and highly advanced machine learning models to either (a) mimic human visual and auditory cognition such as reading X-Ray images or inspecting car assembly for defects, or (b) to solve new problems altogether like discovering new drugs by combining vast amount of clinical, genetics and molecular data. What deep learning can do traditional data analytics just cannot do. They solve different classes of problems. That also means new set of problems arise in managing AI deployments. AI is a different beast.
AI requires continuous experimentation with new training models often with new datasets arriving daily or weekly. In a matter of months there can be tens of models in a small organization and a few hundred models across a midsize chemical or pharma research organization. Each revision of the model is paired with a revision of a set of data. One cannot review the model code by itself and quickly infer how it can arrive at a decision or prediction. For that each model has to be paired with the set of data that was used to build the model. Each model can have hundreds of hyper-parameters being tuned. The permutations of models, datasets and hyper-parameters hence can grow rapidly and before you know it is chaos. To be able to explain how the model arrives at a prediction decision one needs versioning of models, datasets, lineage, and governance. This is almost like the traceability and lineage of the food distribution and supply chain. And this is what an MLOps platform has to provide.
This can get more interesting with the organization dynamics of any company. Consider the following business and organization scenarios:
1. A served model (Model A) using a certain class of imaging cameras gets old and starts to fail. How and when would you know quickly that a newer model (Model B) exists that relies on data from newer, modern high res cameras? How do you know that Model B was run on a new dataset that is still not visible in the field for the serving Model A doing live inferences? The cameras in the field are still old. Model B if deployed prematurely can fail too in such case despite all the good intentions. Until the field dataset aligns with the Model B dataset, the alternative in the meantime can be to tweak the hyper-parameters of the existing Model A on the current dataset and redeploy as Model C may be, until that new dataset for the Model B is also visible in the field.
2. A star data scientist produced few killer models over past year. One of her models gets deployed and is being used extensively. Then she leaves the company or moves to another department. Her replacement does not know which specific dataset was used in the creation of his model. When her served model ages or fails in the field (and it will one day thanks to the nature of AI requiring continuous training like a human brain) it can be long and painful process to trace that lineage to debug the failure and then upgrade the model. Github alone cannot provide lineage and cross correlation of models and datasets. Without it the replacement data scientist cannot reproduce the results of the model successfully.
3. A prediction model is developed and deployed this year for a health research problem on a certain demographic (Demographic A). Soon the same model gets exported to another region or country, to another demographic (Demographic B). It is failing in a major way in its prediction. How do you as the manager of the new deployment in Demographic B know what exact dataset and biases were used in the model you just imported from Demographic A. Just having different names of demographics cannot be sufficient esp in medical research. One needs specific datasets to explain, even possibly through advanced AI visualization (auto-encoders) to discern the biases being brought in from Demographic A.
4. How will you enforce that deploying models in production is supervised and authorized only for a handful of designated people with appropriate credentials. There has to be checks and balances by peers as well to ensure quality and security of deployments. In the age of constant hacking, unauthorized access and tweaking of models in deployment can have severe business risk and/or safety implications.
To address these challenges an MLOps platform is needed for your data scientists, ML engineers for them to able to manage and monitor their datasets and models and enforce a discipline on how they work together. MLOps is to AI what DevOps is for software development. An MLOps platform augments the current process of project management, for projects based on standard AI frameworks like Tensorflow or PyTorch with Github repos, and improves the process into an industrial grade end to end supply chain like the Supermarket example above.
The AI Supply Chain Pipeline: An MLOps platform must allow the linear code that most data scientists rely on for pre-processing data, for feature extraction, for training, for serving, for model monitoring to be broken up in in different component and stages that run in a pipeline fashion. Different people in the organization may be responsible for different components. These components can be run in certain sequence triggered by time or other event such as new data arrival or a certain increase in error rates in the live predictions by the model serving your inference endpoints. Each component’s run is versioned, with dates, time-stamps, and tags. Each component has various artifacts and logs available. This allows a pipeline of continuous deep learning to be automated and run nightly or on other triggers. This allows the pipeline to be debugged quickly if and when needed, which in many projects can be every day or every week.
To summarize the MLOps platform should be able to meet these requirements of the business and govt. regulation to be able to explain and ensure to the executives the following:
1. Traceability: To be able to trace the lineage of a model in production serving all the way back to specific hyper-parameters, and specific version of model’s pipeline components and the dataset.
2. Reproducibility: To be able to reproduce the prediction result from any model version consistently by having the lineage correctly extracted.
3. Governance: To be able to audit and authorize the author of the data or the model, when and where and alert flags in case of an unauthorized use.
4. Model Cataloging: With hundreds of models there has to be way to catalog their key attributes, usage profiles, and other important details for broader teams or other organizations to find the models.
Most MLOps platforms only offer bits and pieces of these requirements. An MLOps platform that comprehensively meets these industry needs will indeed enable the true AI revolution for years and decades to come. We at One Convergence Inc. are building such a platform. Built on Kubernetes and Kubeflow foundation our DKube MLOps platform meets all these needs for your AI projects. Please contact us at firstname.lastname@example.org for further details or visit www.oneconvergence.com/dkube.