Great Ways To Implement MLOPs

9 min readMay 9, 2022

Machine Learning Development is complex due to various aspects. Many wonderful Machine Learning projects do not get into production due to various challenges.

Unlike traditional software development projects, One of the primary goals of a machine learning project is to optimize the metrics by constantly experimenting to improve them. The quality of a machine learning project primarily depends on input data and tuning parameters. Sometimes we combine multiple models to achieve this.

Scope & Stages

As part of the scope of this article, we will not be looking at the algorithmic complexity or ethical AI part involved in machine learning projects, but we would primarily be focusing on the evolution of the model from the inception stage to the final stage. The key stages in the journey include

Machine learning stages

Stage 1 — Problem Definition/ Scoping
Stage 2 — Gather Data / Raw Data Injection
Stage 3 — Prepare Data and Discover Model Features
Stage 4 — Train the model
Stage 5 — Evaluate the model
Stage 6 — Deploy & Monitor Model Performance

Most of the above stages except Stage 1 can be automated through data pipelines.

Challenges

In each stage, we might encounter challenges like scaling and in some stages like data preparation and training, we would have the tuning problems. Between the Training and Deployment phases, we might have the model exchange problem where need to be sure that the right model is pushed to production. we need to have the right mechanism in place to ensure the best-trained model is replacing the model which we were monitoring along with the exact parameter/coefficient values for the algorithms which we have used in the training. Also, as part of governance, we might need to explain how the model got evolved, interpreted and who used it and when etc. As part of this article, we would be looking at one of the approaches to resolve these challenges.

Solution — Approach1

There are various tools that can help us in resolving these challenges individually, but we need to integrate and customize them based on our requirements. Let’s explore different MLOPs life cycle tools. As part of this article, we would be looking at the below technology stack as our first approach to implementing the MLOPs.

Before we jump into the detailed aspects of these tools, Let’s have a closer look at the different stages involved in sequence, where we will be using these tools, and what are the challenges these tools address in each stage.

Stage 1 — Defining Problem Statement

Identifying the problem statement is one of the most time-consuming tasks and involves decision modelling exercises. The DECISIONSFIRST™ MODELER is one of the best tools to perform decision modelling and discover the knowledge automation areas but since this article focuses on MLOPs, we will skip this one for time being.

To appreciate the MLOPs lifecycle on the above-listed stages, let us take a classical use-case to understand the details steps involved. Fisher’s Iris data set is one of the most widely used datasets to understand different aspects of machine learning but let’s take a financial use case for a change.

About the Data Set

The major aim is to predict which of the customers will have their loan approved. It is a simple classification problem with the following features.

The loan prediction data set is available on Kaggle to explore.

About the Project Structure

Defining the right project structure is the key to avoiding many cumbersome activities in machine learning projects. one of the best ways to create the project structure is using a cookiecutter.

Please follow the below steps to create the project structure using the Cookiecutter tool.

pip install cookiecutter

cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

This will clone the standard data science project structure to the local machine by running through questions from the existing JSON file for our project customization. Please find the project structure for our use case.

About the Code Version control

Once we have the project structure clearly defined, we can start tracking the code change using code version control tools like Git. To manage the code in the remote repository we can use GitHub, GitLab, or any equivalent. For our MLOPs lifecycle approch1, we will be using GitHub as the remote code storage.

GitHub - amudhansubbiah/mlops_lifecycle_track1

Contribute to amudhansubbiah/mlops_lifecycle_track1 development by creating an account on GitHub.

github.com

About Virtual Environment

Every project needs a new virtual environment to isolate the dependencies. Hence, we also need to create and activate the virtual environment to install dependencies related to the project.

There are tools available to Create Virtual Environments like conda, virtualenv, venv etc. Let’s install anaconda and create the virtual environment loan_prediction using conda.

Create a virtual environment
type Anaconda prompt
conda create — name loan_prediction python=3.8

Activate the virtual environment
conda activate loan_prediction

Now we can install all the dependencies using pip install and also list all the installed packages using pip list or conda list.

Deactivate the virtual environment
conda deactivate

Stage 2 — Gather data

DVC is one of the best tools to track data changes and it runs on top of git. We need to install DVC first and in case you have the data in the remote storage, we need to specify that. In our case, we have the remote data registry in Google Drive.

Install DVC

pip install “dvc[gdrive]”

Add Remote Storage

dvc remote add -d loan_mlops gdrive://1aAK_I-iz4f-fgTD6z93MY1fAcpQ8quNC

Once we have initialised the local project folder with git for code versioning, we can initialize the DVC using dvc init command.

Stage 3 — Prepare data

To record and reproduce the ETL tasks that we have performed in our experiments, we need to modularise the python scrips. These scripts need to be transformed into stages. Each stage will contain the command for creating the stage, its dependencies, and outputs. This information is recorded in dvc.yaml.

These stages are integrated as steps in the data pipelines.

Use the dvc run command to create stages

We perform the task of converting the python script into stages iteratively until we reach the last stage.

Stage 4— Train the Model

We have utilised MLFLOW to track the Experiments and Runs. The Run is an instance of code that MLFLOW executed to create a ML model by taking in the parameters and metrics to perform the prediction and to provide the output. The experiment is a collection of different Runs.

We have added our train_model stage into the dvc.yaml which utilises the MLFlow API to track the experiments and runs. MLFlow API uses MLFlow Client which calls the MLFlow REST API to talk to the tracking server.

The MLFlow Tracking server utilises two main stores, Metadata Store & Artifact Store. Metadata store contains information about the parameters, metrics,Tags & Notes , Artifacts, Source code, Source code Version, Run & Expriments. Artifact store contains the actual serialised model and dependencies information. We have utilised MYSQL for the Metadata Store & local file system for the Artifact Store.

conda activate loan_prediction

mlflow server --backend-store-uri mysql+pymysql://root@localhost

/mlflow_tracking_database4 — default-artifact-root file:/./mlruns -h 0.0.0.0 -p 5000

To record each experiment using MLFLOW , we need run the above commands in our virtual environment to bring up the MLFLOW service.

Stage 5— Evaluate the Model

Since we have recorded all the experiments and runs of each experiment, we now need to find the best performing model from all these runs. We will tag and register the best performing model to the production area and the next model to the staging area. we can also tag the outdated models to archive. This evaluation and tagging can be done manually in the below MLFLOW dashboard or we can also select the best performing model through code once we decide on the metric for evaluation.

conda activate loan_prediction

mlflow server — backend-store-uri mysql+pymysql://root@localhost/mlflow_tracking_database4 — default-artifact-root file:/./mlruns -h 0.0.0.0 -p 5000

Stage 6— Deploy the Model & Monitor Performance

Serving the Model — We would be using Flask to serve the predicted results using a custom-built web application.

Flask Application for Auto loan Approval

Deploy the Model — Our Flask Application is ready to be deployed in Heroku. We need to install Heroku CLI and Activate our loan_prediction virtual environment to deploy our application in Heroku.

conda activate loan_prediction
type heroku login

This will direct us to the Heroku webpage, where we can provide our Heroku login id and password from the authenticator App. After successful login, provide the below command to create the authorization token

heroku authorizations:create

In the github settings provide this token as secrete key

It is recommended that we use web servers that support concurrent request processing whenever developing and running production services. The Django and Flask web frameworks servers only process a single request at a time.

If we deploy Flask or Django web frameworks on one of the web servers on Heroku, our dyno resources will be underutilized and the application will feel unresponsive.

Gunicorn is a pure-Python HTTP server for WSGI applications. It allows you to run any Python application concurrently by running multiple Python processes within a single dyno. It provides a perfect balance of performance, flexibility, and configuration simplicity.

Adding Gunicorn to our application
pip install gunicorn

Also ensure gunicorn is added to the requirements.txt

Create the procfile for flask app with the below content for deployment
web gunicorn app:app

One of the best ways to install the application package is using a setup.py & pip. Once we create setup.py we just need to run pip install -e .

In our requirements.txt we have included the pip install -e. to install all the dependencies during deployment.

Model Monitoring — Our Best Model is currently in a production environment but over a period of time, the model’s performance may decrease. Hence we will have to monitor the model performance so that we can retrain the model at right time. EvidentlyAI is one of the best open-source tools available for monitoring. We can measure model performance by conducting statistical tests to measure the data drift and model drift.

Install EvidentlyAI

pip install evidently

Since we have deployed a model which provides prediction as a service, it is hard to find the data drift directly. So we will have to collect the new data periodically to test the drift. We have placed the newly collected data in the raw directory to check the Drift. The below dashboard is generated in the reports folder by monitoring the production model and examining the new dataset using EvidentlyAI libraries to review the data and model drift.

Here Data Drift has been detected in 6 features and Target drift is also visible by examining the Target Behavior by Features. In order to get a detailed view of the drift, kindly look at the generated dashboard in the Git project report folder.

Conclusion

We have looked at various challenges and corresponding solutions using various tools from the inception stage to the final stage of model evolution.

In the next article, we will be discussing solutions for other challenges like scaling and deployment for larger data sets using different sets of tools.

References:
1. https://raghav-menon.medium.com/version-control-using-git-and-github-part-1-90e6b09ed745
2. https://dvc.org/
3. https://mlflow.org/
4. https://evidentlyai.com/