Machine Learning in Production— Your Comprehensive 101 Practical Guide

12 min readAug 26, 2023

From development to deployment, let’s build MLOps architecture through a captivating real-world project. Gain valuable insights into the tools and strategies that fuel successful ML deployments🚀!

This project has been featured in DVC Aug’23 newsletter🎈🎉🗞️

Official👉 GitHub Repository 📖🪶

In this article I am going to share you the best practices for MLOps projects. In the given repository you can see the project structure, modular code, tricks and techniques for better understanding of this pipeline. Instead of wasting time on theory let’s understand the workflow.

📖The Practical Guide:

For this guide I am taking simple dataset which is Customer Churn Prediction. Keeping thigs basics so everyone can understand. I have already published a blog for BERT/NLP based MLOps project. Check out.

📜Pre-script:

This guide contains an end-to-end lifecycle for MLOps. I am not explaining each and every tool or concept in the depth. MLOps is a series of tools, lifecycles, steps, and processes. Here is the prerequisite for this guide:

Python Programming with OOP concept.
DVC and Git/GitHub.
Statistical concepts for ML and ML algorithms.
MLFlow and Fine-Tuning concept.
Fundamentals o FastAPI, PyTest and .yaml are mandatory.
Concept of Containerization and Orchestration such as Docker and Kubernetes.
Microsoft Azure

Learning MLOps is not about knowing these tools. Tools can be vary based on industry, interest of company, etc. Learning the lifecycle is so important rather than just knowing about tools. Although it is a plus point if you the above tools.

🧬Lifecycle in Nutshell:

Project Goal: In this project, our main objective is to predict churning of the customer (Very common project in ML).
Project Architecture: Build an effective and modular based project structure which will make in running/re-training pipeline easily. Set GitHub and DVC.
Data Pipeline: Implement data ingestion, transformation (imputation, class imbalance, etc.) and generating more data samples.
Model Training: Training the data on the test data along with experimentation using MLFlow. Preform hyperparameter tuning and save the best model along with its result.
Build an API: Write a script for prediction-pipeline to validate the data class and embedded model into FastAPI.
Test an API: Test your prediction an API before deploying into the production.
Containerization: Build a docker image and orchestrate using Kubernetes.
Set CI/CD: Set a CI/CD pipeline for continuous integration and deployment of your code in production.

Where is EDA? This guide focuses on building ML pipeline rather than exploring data analytics process. In the GitHub repo you can find a folder called ‘notebooks’ where I performed EDA. EDA is very crucial to understand statistical behavior of data. I highly recommended EDA specially for tabular data. In this guide I am not showing it.

🎯Project Goal

Create an end-to-end Machine Learning lifecycle from gathering data, transforming it, training ML models, serving model in an API, testing the API, and deploying the API in production using Containerization and Orchestration. Using customer churn prediction data by following MLOps lifecycle.

📂Project Architecture

(Broad way of folder structure, check GitHub for detailed view)

Set project path so it will not give you any import error:

In .env file mention your local project root directory.
Use this code in every script so we can import any other script like modules or vice versa.

import sys
import os
from dotenv import load_dotenv
load_dotenv()
project_home_path = os.environ.get('PROJECT_HOME_PATH')
sys.path.append(project_home_path)

2. Use DVC to version control the data. See this DVC documentation. Set up GitHub repository. Set a new Python virtual environment quickly:

python -m venv env

3. Make requirements.txt as follows:

wheel
pandas
imblearn
numpy
scikit-learn==1.2.2
dvc
xgboost
dill
catboost
lightgbm
mlflow==2.3.*
fastapi
uvicorn
python-dotenv
faker
pytest

4. Install requirements.txt:

pip install -r requirements.txt

5. Create configure.yaml

Make a configure.yaml file
Make configure.py file and use it as a module for using values from configure.yaml

import os
import sys
import yaml

def read_configure():
    yaml_path = os.path.join(os.path.dirname(__file__),'configure.yaml')
    with open(yaml_path, 'r') as yaml_file:
        configure = yaml.safe_load(yaml_file)
    return configure

configure = read_configure()

if __name__ == '__main__':
    print("You can import configure straight as a module")

🚧Data Pipeline

Data Ingestion: The data ingestion pipeline utilizes Object-Oriented Programming (OOP) for efficient handling of tasks. Synthetic data is generated using the “Faker” library and split into train and test sets. The datasets are then saved as artifacts for easy access and reproducibility. This OOP-based approach enhances modularity and scalability in data processing.
Data transformation: It is a pivotal step in any data pipeline. For numerical data, missing values can be filled with means and then normalized using StandardScaler. Categorical data benefits from mode-based imputation and one-hot encoding. Additionally, addressing class imbalance is essential for robust model performance. By applying these techniques, data transformation creates a more reliable and balanced foundation for downstream analysis and machine learning models.

🏋️Model Training

Select Algorithms: Make a dictionary of popular classification algorithms such as Scikit-Learn (Logistic Regression, Random Forest, etc.), XGBoost, CatBoost, and LightGBM. I tried all the possible algorithms and found that Random Forest is best in this case. For that I commented other algorithms to save huge amount of time. You can try the experiments by your own.
Fine-Tuning: Perform hyperparameters tuning using RandomSearchCV and get the best model with hyperparameters. Before that save the parameters for each model in separate .py file in dictionary structure.
Experimentation: Set an MLFlow pipeline for experimenting the algorithm, parameters, and metrics. This will demonstrate which ML algorithm or model is better in this scenario.
Save Artifacts: Save the best model and transformation pipeline in .pkl format. Save the result (best model name, Accuracy, Precision, Recall, F1-Score, and ROC_AUC Score} in the .json format.

import os
import sys
from dotenv import load_dotenv
load_dotenv()
project_home_path = os.environ.get('PROJECT_HOME_PATH')
sys.path.append(project_home_path)
import json
from datetime import datetime
from dataclasses import dataclass
from collections import Counter
from sklearn.model_selection import train_test_split

from sklearn.ensemble import (
    AdaBoostClassifier,
    RandomForestClassifier,
)
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import (
     f1_score, accuracy_score, roc_auc_score, precision_score, recall_score
)

from src.tracking.mlflow_tracking import ExperimentTracking
from src.exception import CustomException
from src.logger import logging
from src.utils import (
    save_object, evaluate_models, cross_validate_model, get_metrics
)

@dataclass
class ModelTrainerConfig:
    trained_model_file_path = os.path.join("artifacts","model.pkl")
    
class ModelTrainer:
    def __init__(self):
        self.model_trainer_config = ModelTrainerConfig()
    
    def initiate_model_trainer(self,train_array,test_array):
        
        try:
            
            logging.info("Splitting training and test input data")
            X_train, y_train, X_test, y_test = (train_array[:,:-1], train_array[:,-1], test_array[:,:-1],test_array[:,-1])
            
            ## validation data
            X_train, X_valid, y_train, y_valid = train_test_split(X_train,y_train,test_size=0.2,random_state=42)
            
           
            models = {
                "Random Forest": RandomForestClassifier(),
                # "XGBoost": XGBClassifier(),
                # "LGBM": LGBMClassifier(),
                # "AdaBoost": AdaBoostClassifier(),
                # "Logistic Regression": LogisticRegression(),
            }
            
            # loading parameters for tuning 
            with open('artifacts/params.json') as params_file:
                params = json.load(params_file)
            
            model_report:dict = evaluate_models(X_train,y_train,X_valid,y_valid,
                                                models,params)
            
            best_model_score = max(sorted(model_report.values()))
            
            best_model_name = list(model_report.keys())[
                list(model_report.values()).index(best_model_score)
            ]
            best_model = models[best_model_name]
            best_model_params = best_model.get_params()
            
            if best_model_score < 0.6:
                raise CustomException("No best model found!")
            logging.info("Best found model on both training and testing dataset.")
            
            save_object(
                file_path = self.model_trainer_config.trained_model_file_path,
                obj = best_model
            )
        
            # Log metrics
            predicted_test = best_model.predict(X_test)
            evaluation_metrices = get_metrics(y_test, predicted_test)
            best_model_artifacts = {'Best Model':best_model_name}
            best_model_artifacts.update(evaluation_metrices)
            
            # mlflow tracking
            
            experiment_name = 'customer-churn-prediction-experiment'+str(datetime.now().strftime("%d-%m-%Y"))

            run_name = 'customer-churn-prediction'+str(datetime.now().strftime("%d-%m-%Y"))
            
            exp_track = ExperimentTracking(
                model = best_model,
                experiment_name = experiment_name, 
                run_name = run_name,
                run_metrics = evaluation_metrices,
                run_params = best_model_params
            )
            
            exp_track.create_experiment()
            # exp_track.to_production(version=1,stage="Production")
            
            with open('artifacts/metrics.json','w') as metrics_file:
                json.dump(best_model_artifacts,metrics_file,indent=4)
            logging.info("Best Model Metrics saved")
            
            return(
                best_model_artifacts
            )
            
        except Exception as e:
            
            raise CustomException(e,sys)

We are using DVC pipeline to run the whole pipeline in one command.

stages:
  ml_pipeline:
    cmd: python src/components/data_ingestion.py
    deps:
    - src/components/data_transformation.py
    - src/tracking/params.py
    - src/components/model_trainer.py
    outs:
    - artifacts/train.csv
    - artifacts/test.csv
    - artifacts/raw.csv

Run the command:

dvc repro

Till here we worked as a data scientist!! But the real fun begins now.

Buckle up, we’re starting our thrilling MLOps journey!

⚡Build an API

After completing model development stage. We will serve our model in the form of API to predict the real-time outcome. For which we are using FastAPI python framework. Let's see step by step:

Predict Pipleine: , integrating a pre-trained model with real-time predictions using FastAPI. It involves loading the model and preprocessor, processing input features, and quickly generating predictions. Simplified structured input is achieved through a dedicated class. This showcases a seamless workflow, merging pre-trained models, simplified input handling, and FastAPI’s real-time prediction prowess for practical applications.
Validation Pipeline: This pipeline ensures that all input data aligns with specific criteria and limitations. This validation process guarantees that only accurate and valid information is processed further, enhancing the overall reliability of the data processing workflow.
API Prediction Service: By combining above pipelines contrsut the FastAPI which will predict the outcome for inputs coming for user. See the code:

from fastapi import FastAPI
import uvicorn
import pandas as pd
from pydantic import BaseModel
import sys
import os
from src.pipeline.predict_pipeline import PredictPipeline, CustomData
from src.exception import CustomException
from src.logger import logging
from src.pipeline.validation_pipeline import CustomDataModel


# Iniiate FastAPI
app = FastAPI()
predictor = PredictPipeline()

@app.get("/")
def home():
    logging.info("Recieved a request at / endpoint.")
    return {"message": "MLOps best practices\n\n Source code: https://www.github.com/karan842/mlops-best-practices"}

@app.post("/")
def home():
    logging.error("Wrong method selected.")
    return {"message":"Wrong method selected1 please use GET method."}

@app.post("/predict")
async def predict_custom_data(custom_data: CustomDataModel):
    try:
        # convert the received Pydantic model to dict
        custom_data_dict = custom_data.dict()
        
        # Create a CustomData instance using the data from Pydantic model
        custom_data_instance = CustomData(**custom_data_dict)
        
        # Get the data as a dataframe from the CustomData instance
        custom_data_df = custom_data_instance.get_data_as_data_frame()
        
        # Make predictions using the PredictPipeline
        preds = predictor.predict(custom_data_df)
        
        # Coverting to the desired output format
        prediction_result = int(preds.item())
        
        if prediction_result == 1:
            return {'Churn Prediction': "Yes"}
        else:
            return {"Churn Prediction": "No"}
        logging.info("Prediction successful.")
        # return {"prediction": prediction_result}
        
    except Exception as e:
        logging.error("Something went wrong on /predict endpoint.")
        return {"Error:": str(e)}  

@app.get("/predict")
def predict_custom_data():
    logging.error("Wrong method selected.")
    return {"message":"Wrong method selected! please use POST method."}

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=4040)

🧪Test An API

Test the FastAPI using PyTest . The objective of it is to test some components of the data. Remember, this not a test related to ML. It is more like testing software application before pushing it into production, so don’t get confused.

import json 
import os 
import sys
file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..')
# print(file_path)
sys.path.append(file_path)
from src.utils import *
import pytest 
from fastapi.testclient import TestClient
from app import app

client = TestClient(app)

def test_home():
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"message": "MLOps best practices\n\n Source code: https://www.github.com/karan842/mlops-best-practices"}

def test_predict_custom_data_valid():
    data = {
        "CreditScore": 700,
        "Geography": "France",
        "Gender": "Male",
        "Age": 35,
        "Tenure": 5,
        "Balance": 2500.0,
        "NumOfProducts": 2,
        "HasCrCard": 1,
        "IsActiveMember": 1,
        "EstimatedSalary": 50000.0
    }
    response = client.post("/predict", json=data)
    assert response.status_code == 200
    assert response.json() == {"Churn Prediction": "No"}

def test_predict_custom_data_invalid_input():
    data = {
        "CreditScore": "invalid_value",
        "Geography": "Spain",
        "Gender": "Female",
        "Age": 18,
        "Tenure": 0,
        "Balance": 1000.0,
        "NumOfProducts": 3,
        "HasCrCard": 0,
        "IsActiveMember": 0,
        "EstimatedSalary": 20000.0
    }
    response = client.post("/predict", json=data)
    assert response.status_code == 422  # 422 indicates validation error

To run the PyTest just do:

pytest

🐳Containerization and 🎵Orchestration

Containerization ensures consistent and portable application deployment, while orchestration automates and optimizes container management, leading to streamlined operations, scalability, and reliability in modern software development.

All About Docker: Building a docker image is really fun! All you need to just write a simple dockerfile and build it and push it. Lets follow these steps:

Write a dockerfile -

# syntax=docker/dockerfile:1
FROM python:3.8-slim
EXPOSE 4040
WORKDIR /churn-prediction
COPY requirements.txt ./requirements.txt
RUN pip install -r requirements.txt
COPY . .
CMD python app.py

Build the docker image -

docker build -t image-name:tag .

Run the image as a container -

docker run -it -p 4040:4040 image-name:tag

Locate the URL and play with it. See more about Docker.

2. Orchestrate with Kubernetes: In this project we are running Kubernetes(k8s) locally and on the Azure. If you are new to k8s then install MiniKube and run a single-node cluster. Install MiniKube

Write .yaml files to run minikube cluster.

Deployment.yaml

# deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: churn-prediction
spec:
  replicas: 3
  selector:
    matchLabels:
      app: churn-prediction
  template:
    metadata:
      labels:
        app: churn-prediction
    spec:
      containers:
      - name: churn-prediction-container
        image: karan842/churn-prediction:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80

Service.yaml

# service
apiVersion: v1
kind: Service
metadata:
  name: prediction-service
spec:
  selector:
    app: churn-prediction
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: NodePort

Run the following command to apply changes.

kubectl apply -f deployment.yaml

kubectl apply -f service.yaml

To run minikube cluster you need series of commands to follow. Here is the document.

For deployment on Azure obviously not use minikube. We will use Azure Kubernetes Service (AKS) to run full-fledge K8S application. To run your application on cloud with k8s then follow these steps:

Login to Azure portal
Create a new resource group
Create Azure Container Registry (ACR) and push your docker image. You will see this in below CI/CD part.
Create Azure Kubernetes Service (AKS) and run your docker container on top of it.

Is that simple?

No way, this is the most lengthy and difficult part of the project. You need very good understanding of DevOps architecture. In fact, MLOps is DevOps for Machine Learning! So, once you created your ML API your ML part is finished. You need to get your hands dirty in these three main components such as Docker, Kubernetes and mainly CI/CD pipeline. Managing this workflow or pipeline is really crucial part and this is an iterative process. Practice more on this specially on Kubernetes!

Follow this video to understand the AKS and ACR. Click here.

Here is the deployment.yaml and service.yaml files for AKS .

deployment.yaml for AKS

apiVersion: apps/v1
kind: Deployment
metadata:
  name: churn-prediction
spec:
  replicas: 3
  selector:
    matchLabels:
      app: churn-prediction
  template:
    metadata:
      labels:
        app: churn-prediction
    spec:
      containers:
      - name: churn-prediction
        image: mlchurnprediction.azurecr.io/churn-prediction:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80

service.yaml for AKS

apiVersion: v1 
kind: Service
metadata:
  name: churn-prediction-service
spec:
  selector:
    app: churn-prediction
  ports:
  - protocol: TCP
    ports: 80
    targetPort: 80
  type: LoadBalancer

🤝Set CI/CD Pipeline

Continuous Integration (CI) and Continuous Deployment (CD) is the most important part in building any production-based application.

CI Pipeline: Ensures that code changes are regularly and automatically integrated into a shared repository. This practice involves automated testing and validation to detect issues early, promoting collaboration and reducing integration problems.
CD Pipeline: Extends CI by automating the process of deploying validated code changes to production. Continuous Deployment involves automatically deploying every successful code change, while Continuous Delivery automates the deployment process but requires manual approval before releasing changes to production.

Here is that magical CI/CD file for our project associated with GitHub Actions:

name: MLOps best practices

on:
  push:
    branches: [ "master" ]

permissions:
  contents: read

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3
    - name: Set up Python 3.10
      uses: actions/setup-python@v3
      with:
        python-version: "3.10"

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt

    - name: Lint with flake8
      run: |
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

    - name: Test with pytest
      run: |
        pytest test

    - name: Build docker image
      run: |
        docker build -t churn-prediction:latest .

    - name: Log in to Azure Container Registry
      uses: azure/docker-login@v1
      with:
        login-server: mlchurnprediction.azurecr.io
        username: ${{ secrets.ACR_USERNAME }}
        password: ${{ secrets.ACR_PASSWORD1 }}
    
    - name: Tag the image with ACR URL
      run: |
        docker tag churn-prediction:latest mlchurnprediction.azurecr.io/churn-prediction:latest

    - name: Push image to Azure Container Registry
      run: |
        docker push mlchurnprediction.azurecr.io/churn-prediction:latest

  deploy:
      needs: build
      runs-on: ubuntu-latest

      steps:
          - name: Checkout Repository
            uses: actions/checkout@v2
          
          - name: Set up Azure CLI
            uses: azure/login@v1
            with:
              creds: ${{ secrets.AZURE_CREDINTALS }}

          - name: Configure kubectl
            run: |
              echo ${{ secrets.AZURE_KUBECONFIG }} > kubeconfig.yaml
              export KUBECONFIG=./kubeconfig.yaml

          - name: Install kubectl
            run: |
              sudo apt-get update && sudo apt-get install -y apt-transport-https
              curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
              echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
              sudo apt-get update
              sudo apt-get install -y kubectl

          - name: Deploy to AKS
            run: |
              kubectl apply -f k8s/
          
          - name: Wait for Deployment rollout
            run: |
              kubectl rollout status deployment/churn-prediction --timeout=5m

To know more about CI/CD pipeline for GitHub Actions. Click here. Also, to set those secrets which are necessary.

🎳 Future Enhancement

After doing all these steps we can launch our application or product for further beta testing or A/B testing whatever is the requirement for the project. MLOps is iterative process and need continous observation of the model in production. Frequent training and data transforming for better results. Let’s quickly see what we can do after this:

Integrate Grafana/Prometheus in the project to monitor algorithm health. This will help us to find any data or concept drift in the production.
Try with different models and data transformation methods so you may get a better model than the current one but always remember building ML model in real-world projects is not like a Kaggle competition. To build an end-to-end ML pipeline ensure that you will start with base model first and then iteratively make it better in future.
Perform several A/B testing or Beta testing.
Create an interactive UI, containerized it and connect with deployed API. Orchestrate both the containers using k8s and run the entire application.
More you can do with this project is to add other tools such as Helm and Terraform.

📚Summing it up

Well, if you are reading this till here then I assure that you understand the lifecycle very well. Need of MLOps is increasing day by day. When I am talking many industry experts from LinkedIn, they always suggested me to learn about MLOps. Try to build more and more projects in MLOps. You can work on Deep Learning project also! So, that’s from my side. Apologize for any grammar mistakes.

Connect me -

Thanks for reading this guide. Share with your friends, give a star on GitHub repository, and upvote this blog post.

Karan Shingde. Signing off!!