The Modern MLOps Blueprint

Danny Farah
Slalom Data & AI
Published in
14 min readApr 24, 2020

What is MLOps?

With increasing complexity in the context of modelling and frameworks as well as the inherent computational complexity involved with these complicated models, organizations are finding it harder and harder to keep up with the evolving needs of the ML world. This is where MLOps and DataOps come into play and can offer your organization the structure you need to be successful in executing on your vision and mission.

MLOps is the fusion of traditional DevOps processes in the context of data science and machine learning. ML processes are data-centric contrasted with the code-centric philosophy of DevOps. Taking learnings and methodologies from DevOps and applying them to the context of Data and ML yields an operating model termed MLOps or DataOps.

What would the ideal process look like?

This blueprint allows you to pick and choose technologies and frameworks that best fit your needs at any stage in your organization’s path to ML enablement. Depending on your organization and the level of maturity you are at, you might not need all aspects of MLOps like CI/CD just yet, or you might already have a CI/CD pipeline and want to standardize the approval and promotion process across teams. Whatever the case may be, this blueprint is designed in a modular and plug & play in nature to allow for staged rollouts and strategizing current vs target state. A gap analysis can be conducted to pinpoint a path forward from current to target state architecture.

We can see that the ideal ML cycle would involve stages that allow for verification and validation of the model being developed, both offline and in a staged environment for stakeholders to give feedback on. Another important aspect is that a cycle is iterative and could take multiple tries to get to the optimal model. Once a model is deployed it still needs to be monitored and evaluated against the initial requirements to update it to account for edge cases or sensitivity. This allows teams to ensure they are meeting both model and business goals before putting something fully in production.

Certain hypotheses cannot be validated offline, and in these cases online evaluation and validation methods can be leveraged to achieve confidence in the model being developed, such as A/B testing and multi-arm bandit optimization. An example hypothesis would be: “deploying this model would increase sales by x amount”. This cannot be validated until after the model is deployed and the sales figures before and after are compared.

The Business Problem or Improvement Opportunity

The first step in any ML initiative is identifying a business problem or an improvement opportunity that may be solved using analytical techniques. Whether the problem or opportunity is descriptive, predictive or prescriptive, the first step would be to identify what the problem is before jumping into solution development. The problem or opportunity can arise out of a need from your business users, either internal or external or identified as a key milestone in your roadmap. Once the problem or opportunity is identified, work can begin on gathering more information with regards to the kinds and sources of data that will be needed and whether or not that data exists and is orchestrated in a way that enables the model to succeed. This stage is more complicated than what will be covered in the context of this article, but a high-level overview of the stages involved will be discussed.

DataOps

DataOps is an operating model very similar to MLOps but applied to the context of data specifically. Without DataOps any ML initiative would have a hard time getting off the ground. DataOps enables teams to minimize the turnaround time of data analytics cycles while maintaining data quality and integrity.​

Identify & Collect

The first step in a DataOps enabled pipeline is identifying sources and source systems where the target data lives. Your organization might already have a data lake or a data warehouse where the team’s data is centralized or you might have the data living in silos on multiple different technologies. Whichever the case may be, the target sources need to be identified, mapped out and access rights need to be granted to those who require it.

Process

Once the source data is identified it needs to be processed and transformed into a form that can be analyzed or explored. This stage is often referred to as Extract Transform Load (ETL) or Extract Load Transform (ELT). Both ETL and ELT have their places and depending on whether the data is at rest or in flight at the source. An example of an ETL use case would be if the data is coming in a stream format and needs to be transformed before it is stored. An example of ELT would be when the data is already landed and needs to be transformed for analytical purposes. ELT models allow the transformation to be done by the data analyst or data scientist while ETL is typically more software heavy and might require intervention by a data engineer. The last component of processing would be the orchestration piece. The data needs to be orchestrated in a way that enables future analytical work.

Store

Using a data warehouse or a data lake is a typical storage option for organizations looking to serve transformed data. Choosing the technology that meets your organization’s access rights, security, and privacy needs would be the hardest choice to make. Most cloud providers offer similar services and tools and the decision might come down to cost or configurability. You would need to work with your engineering teams and business leaders to understand the specific needs of your organization to make the best choice for your specific needs.

Problem Formulation

Once a business problem or opportunity for improvement is identified, the data has been landed and transformed, it is time to now frame this problem in an analytical context. This step is more of an art than a science as there are multiple ways of achieving the same result in the Machine Learning world. The primary components that need to be identified are the inputs and outputs of the model and how it will interface with existing or new applications.

Requirements

The first step of problem formulation would be to come up with hypotheses and requirements that would support this hypothesis. Requirements allow for data teams to benchmark their model against a baseline performance or expectation from the business. If most of the work for requirements is done upfront it becomes easier to evaluate whether a model is successful or not down the line in the MLOps process. This step is iterative and your team will likely come back to update requirements as new findings arise. MLOps by its nature is iterative and cyclic, so it is important to flesh out as many details early on to avoid endless cycles with no outcomes, something we at Slalom like to call POC purgatory. Requirements are living, breathing things that evolve throughout the MLOps process. The requirements can include metrics and KPI’s such as the cost of mislabelling or misclassifying certain data points. These can be used to put a dollar cost on false positives or negatives for example.

Explore

This step involves a lot of back and forth between the users or stakeholders that identified the problem or opportunity and analyzing the underlying data to evaluate relationships between data points. This step involves a lot of thought experiments to model the system at hand and come up with plausible hypotheses. Once the data has been explored, a few hypotheses have been selected, then comes the time to elicit some requirements for the model as well as the business for future benchmarking needs.

Analytical Framework Selection

Once there is a deep understanding of the data, a few hypotheses to test and a summary of the requirements of the ML initiative, then a specific framework can be selected. The main aspect of the framework selection would be the underlying statistical model that is used to make the prediction or analysis, such as regression, classification or clustering. Another aspect would be whether the model needs to be supervised or unsupervised, which would depend on whether you are trying to predict an outcome or find relationships within your data. Certain libraries and frameworks have features that enable specific use-cases better than others. This selection step will have an impact down the line on scalability and modularity. The requirements can be used to help evaluate which framework to select for a given business problem. This is also a good stage to decide whether using advanced ML techniques such as neural networks would benefit the solution.

Modelling

Modelling is the stage where the model is developed and evaluated against offline metrics. This is where the features are engineered from the data, the model is trained and tested and finally packaged for the next stage of MLOps which is operationalization.

Extract

In machine learning, features refer to characteristics relating to the proposed hypothesis. The goal is to extract features that contain information about or prove meaningful relationships with the target business problem or improvement opportunity. This process is yet another artistic and applied part of MLOps where data scientists will leverage the subject matter experts in the domain and data mining techniques to populate a list of relevant features for further exploration. Domain level experts will usually have an intuitive understanding of the data and can help in selecting relevant features. Furthermore, if you need to work with unstructured data, such as text, logs, images, or voice, a different set of feature engineering techniques might be required. These features can be implicit or explicit. An example of an explicit feature would be a data point that already exists with minimal transformation required. An implicit feature is one that is not directly available but can be derived from one or more other data points.

Train

Training is fitting a model to your train data set while maintaining optimal performance and meeting the requirements. This stage is quantitative and relies on domain level expertise in the data science world as well as a deep understanding of the statistical relationships within your data. Decisions on how long to train a model, if that is an option, and whether longer training will benefit the model performance need to be evaluated as well. Once the model is trained and meets the basic requirements for evaluation it can move on to offline evaluation.

This part of the process usually takes place in a notebook environment which is known to be the playground for data scientists and analysts. The environment could be running on localhost or in the cloud, but ideally the work should be shareable and reproducible. Required packages are imported and experimentation with different sets of machine learning algorithms and different sets of features takes place within this environment. For the outcomes to be repeatable, scalable, unbiased, and robust to outliers your team should also follow a series of best practices in the context of data science such as train/test split, cross-validation, unbalanced data treatments just to name a few. During this stage the data is split into train and test sets for later offline validation of the model. The train data will be used to train the model while the test data is withheld during training as a simulation of unforeseen data.

Optimize

Using offline evaluation, your data team can rapidly iterate and optimize the model before going live. Certain hypotheses can also be supported or disproved altogether during the offline evaluation stage. Quantitative metrics can be leveraged to evaluate the model such as accuracy, precision and recall, F1 score, RMSE, or MSE, while qualitative analyses can be used to evaluate the output of the model subjectively. The goal of this stage is to evaluate the performance of the model on the unseen test data set. This is a simulation of training the model and running predictions on new data and can help identify gaps and edge cases early on.

Continuous Integration

Continuous integration is a software development practice that attempts to ensure developers commit their code into central repository and changes are tracked using version control systems. This makes it easier to add features, find bugs in the code and facilitate communication among the developers, engineers and scientists working on different pieces of the model. This will reduce integration problems as new features can be added to deliver the program quicker and tested from end to end each time.

Plan, Develop & Package

When the model is ready to be published, the underlying code, dependencies, model, and feature extractors will be pushed to your central repository on a separate branch which can then be merged back into the master or main branch. If this is the first pass of your model you can create this master branch and commit the code to there for this pass. This piece will then be queued to be tested before merging to the main branch (if it exists) which contains the latest working version of the program in production. The push should automatically trigger the build process which containerizes the program and prepares that container for deployment. Containerizing a deployment means packaging all the dependencies into one file that can be easily moved around and scaled across machines.

Test

After planning, development and packaging are done and the container is ready to be shipped, the process continues with performing unit tests and integration tests. Unit tests make sure the building blocks of the code are individually working according to their requirements. Integration tests ensure the components work well collectively without any error. Setting up these tests is the responsibility of the developers and engineers and the test cases should be in alignment with the business requirements.

Continuous Delivery

Continuous delivery is a software development practice that ensures deployments can be done rapidly, shifting away from traditional release dates to several releases a day.

Staging

Before the model can be deployed to production it will be staged in a pre-production environment to provide one last opportunity to make sure all the model requirements are met. This stage can also be used to demo the model to business stakeholders and get further feedback before final deployment into the production environment. This staging environment can also be used to do more integration testing and load testing before going live. Latency and response time can be evaluated in this environment and optimized if found to be suboptimal.

Have Model Goals Been Met?

Once the model is built, staged and ready to be deployed to production it needs to be validated one last time. During this stage feedback should be elicited from the stakeholders of the business problem to see if it meets basic expectations of how it performs and operates. If the model does meet the requirements it can move on to approval for promotion to production. If it is rejected, the received feedback can be used to iterate back to where the changes need to happen. Whether the change is in the data or the interface of the model would affect how far back the team needs to go in the cycle.

Approval

Once the model goals have been met it needs to be approved for serving and versioning. This is a manual step and is usually done by a senior engineer or manager. The ideal process would require multiple approvers for redundancy and ensuring a high level of quality in deliveries. Once the model is approved it can go on to versioning and serving.

Release & Configure

Whether you have a versioning system established in your organization, or if you don’t currently have one, you should consider using a standardized semantic versioning system. Once the model version has been enumerated appropriately it can be served and exposed for traffic, or configured to split traffic for experimentation and hypothesis validation. You might not want to serve all of your production traffic to the model until you have a certain level of confidence in its performance. This is where live validation and monitoring comes into play.

Validation & Monitoring

The final stages in the MLOps cycle are evaluating how the model performs live in a production environment. This can inform whether the model needs further iterations or whether it can be shelved and other models can be worked on. Having good monitoring in place can also inform whether a model is getting stale or there is significant model drift.

Monitor

Once the model is served and lives in production for part of all of your traffic, key metrics and indicators can be analyzed to evaluate the validity of the initial hypothesis. Recall the earlier example of increasing sales by deploying a model; this is where that hypothesis can be validated or disproved. In this case the only variable that matters is the number of sales, but other metrics such as actual traffic split versus configured and the number of people with experience A versus B can be used to evaluate the serving layer of the model. The model might be performing perfectly but the traffic might not be split as expected therefore the resulting analyses are invalidated. Having full visibility into your pipeline ensures timely and efficient diagnoses of different types of problems. Once the analyses are complete and the resulting metrics are validated, the model can be re-evaluated against the initial business requirements and business goals to instruct further cycles or priorities.

Validate

The final stage in the MLOps lifecycle is to validate that a model has met the business goals set in the initial hypothesis. If a model has met the goals there might be a need to now change these goals and hence the cycle begins again at the requirements elicitation stage. If a model has not met the goals the reason behind that can be pinpointed and further development and retraining can be conducted to try to meet or exceed the business goals. The end of a cycle would be when a model has been through the cycle a few times and meets all the goals and requirements, at which point your team can start this cycle all over again on another exciting and empowering project using this blueprint for success.

MLOps and DataOps are the missing pieces from many organization’s toolbelts for success and can be very complicated to create a holistic roadmap towards the desired state. Each facet of MLOps and DataOps in and of itself requires extensive knowledge and expertise and therefore might be challenging to roll out an organization-wide strategy for competitive advantage. Slalom can help you achieve organizational success in the context of ML and AI projects. For further information on how Slalom can help on your ML journey visit slalom.com.

Thank you to Ali Arabi, Florian Anshelm, Raki Rahman and Firas Farah.

--

--

Danny Farah
Slalom Data & AI

Data Science and Machine Learning Consultant with expertise in big data, web development and cloud.