Fundamentals of MLOps — Part 4 | Tracking with MLFlow & Deployment with FastAPI

Tezan Sahu
Sep 5 · 17 min read

Over the last 3 posts, we developed a firm understanding of how to version ML artifacts & build ML Pipelines with great ease & sophistication. By now, you should be reasonably familiar with tracking your data using DVC & experimenting with various ML models for regression & classification tasks by leveraging the numerous pre-processing, feature engineering & fine-tuning techniques in PyCaret.

In this final blog of the series, we will address the topics involving the logging of experimentation results effectively, deployment of ML models & finally serving them as an API endpoint for users to actually interact with the model to get desired predictions. So, let’s dive right in!

Contents

Experiment Tracking: Why does it Matter?

We have seen that developing ML models for any system is a highly iterative process, involving a lot of experimentation. These experiments could typically differ in:

Each of these experiments needs to be evaluated as well with appropriate metrics to come up with certain conclusions, which assist us in either designing further experiments or making a definitive inference. Thus, it is of utmost importance that we maintain a record, or track our experiments carefully.

Image Source: Machine Learning Model Management

Experiment tracking is a part of MLOps that refers to the practice of saving all experiment-related information (the metadata, that will help in drawing meaningful conclusions) for each experiment performed while designing an ML Pipeline.

Experiment tracking helps us in accomplishing the following:

MLFlow for Tracking PyCaret Experiments

There are several tools & platforms that help in ML experiment tracking. These include MLFlow, Weights & Biases, Neptune, TensorBoard, etc. We will try to scratch the surface of one of these — MLFlow Tracking (typically because it offers great flexibility in tracking experiments & is also integrated with PyCaret)

MLFlow Fundamentals

MLFlow is a library-agnostic open-source tool that offers various solutions to manage end-to-end ML workflows:

Image Source: Databricks Open Sources MLflow to Simplify Machine Learning Lifecycle

In this post, we will look at MLFlow Tracking. You are free to explore & use the other components as deemed necessary.

The MLflow Tracking component provides an API and UI for logging parameters, code versions, metrics, and output files while executing ML code during experiments & viewing the results.

Setting up MLFlow Tracking for PyCaret Experiments

The functionality to track experiments using MLFlow has been embedded into PyCaret 2.0 (the mlflow package is installed automatically while installing pycaret) & can be set up with a few simple steps:

Step 1: Set up the tracking server (the location — local or remote — where the logs & tracked data will be stored) by importing mlflow & calling the set_tracking_uri(...) function with the address of the tracking server as follows:

import mlflow

# To use a local folder to store the logs, prefic the full path with 'file:/' & use it
mlflow.set_tracking_uri('file:/<full-path-to-local-folder>')

# To use a remote storage location, provide the HTTP URI, for example:
mlflow.set_tracking_uri('<https://my-tracking-server:5000>')

If not set explicitly, a folder named mlruns will be created in the current directory & logs will be stored there.

Step 2: Start logging the experiments by using the following additional parameters while calling the setup(...) function:

Now we can proceed as usual with PyCaret to select & train our desired models, tune the hyperparameters & evaluate them through the metrics & plots. All the metadata related to the experiment will automatically get logged to our tracking server, which can be viewed later using the MLFlow UI.

Navigating through the MLFlow UI

Having completed our experiment(s), we can proceed to view the logs in the MLFlow UI. This UI enables visualization, searching & comparison between experiments, along with the provision to download the various ML artifacts (models, data & metadata) for further analysis.

The following is a screenshot of the UI for a PyCaret experiment tracked using MLFlow. As you can see, the execution of each model in every step of the workflow has been recorded as a run, with the details of the metrics, run execution time, metrics & also the source function.

Apart from the final model being used (Light GBM here), you can also find a clickable link to all the other models that were created during the compare_models() function call. Clicking on the respective models takes you to the page that displays all the details about the model including parameters, metrics, tags & also the plots & other downloadable artifacts (if created during the experiment). It also contains code snippets that can be used to load the saved model using mlflow & make predictions. You can also "Register" your models to the MLFlow Model Registry if needed.

An interesting aspect of tracking using MLFlow is that it allows you to download & use those models as well which were not considered as the experiment progressed (like the ones not selected after running compare_models()) as they can be useful for other related analyses as well.

MLFlow (Beyond PyCaret)

As we have seen above, MLFlow Tracking is already embedded into PyCaret & can be set up in just a couple of simple steps. However, MLFlow offers much more flexibility as well & can be easily integrated into several other ML & DL frameworks including Scikit-Learn, Tensorflow, Keras, PyTorch, FastAI, etc for automatic logging purposes. You can refer to the documentation to understand how it can be used with these frameworks.

Model Deployment: What is it?

Having understood the various aspects of effectively experimenting with our ML models, we now come to the final stage of our pipeline — deploying our trained models to production & making inferences using them.

Model deployment can be considered as the process of exposing a trained ML model in a production environment to be used by the rest of the world for making inferences.

To get the most out of ML models, they must be smoothly deployed into production so that businesses can begin using them in practice.

There are several ways to deploying & serving ML models into production, with advantages varying depending on the individual use case:

Offline Predictions

Batch Prediction

Real-time Serving

Streaming

Different tools cater to each of these approaches. In this post, we will look into how our ML models can be deployed to the cloud & served for making real-time predictions.

Deploying a PyCaret Model to the Cloud

In the previous blog, we used PyCaret to develop a complete ML pipeline. We became familiar with using the save_model(...), load_model(...)& predict_model(...). Now we will see the steps to finally deploy our trained & fine-tuned model to the could & use it later for making inferences.

For this exercise, I assume that you have completed the tutorial in Part 3 of this series & have your blended model for predicting the critical temperature of superconducting materials.

Model Finalization

While setting up an experiment in PyCaret, we split the entire training dataset into training & validation sets. During the pipeline, we train our model(s) on this training set & later evaluate it on the held-out validation set to verify our performance metrics. But once verified that our chosen model is performing as expected on the held-out validation set, we may want to train our model one final time on the whole dataset (training + validation). This can be accomplished using finalize_model(...).

To use this, you will have to run the setup(...) function for the Intermediate PyCaret section, load your saved blended model using blended_model = load_model("blended_expt2") & then train it on the entire training set using final_blended = finalize_model(blended_model)

With this step in place, we can now proceed to deploy our model to the cloud.

Model Deployment on AWS

Although a trained model can be deployed locally as a .pkl file using the save_model(...) function we saw in the previous post, we can alternatively deploy our models to the cloud. PyCaret allows users to deploy their models to a variety of cloud platforms like AWS, Azure & GCP. In this blog, we will see how to deploy our trained model to AWS since we have created our free-tier AWS account & also have set up our S3 bucket in Part 2 while learning to use DVC.

Although the necessary library, i.e. boto3 would have automatically been installed when you installed dvc[s3] in Part 2, but if not done, you can install it now using pip install boto3. The necessary environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY & AWS_DEFAULT_REGION should be configured as well (ignore if already done while following Part 2).

Now, we can simply deploy the model to our S3 bucket using:

# Deploy model
deploy_model(
model = final_blended,
model_name = 'lightgbm_deploy_1',
platform = 'aws',
authentication = {'bucket' : 'mlopsdvc170100035'}
)
# Enter your respective bucket name in place of 'mlopsdvc170100035'

Making Inferences using a Deployed Model

Now, just as we used load_model(...) & predict_model(...) to make inferences using locally saved models (seen in Part 3), we can use the combination to make inferences using our model deployed on AWS:

loaded_model = load_model(
'lightgbm_deploy_1',
platform = 'aws',
authentication = { 'bucket' : 'mlopsdvc170100035' }
)
predictions = predict_model(loaded_model, data=data_unseen)predictions.head() # View some of the predictions

Now, we know that we can use the above snippet anywhere in our application to load the deployed model & make batch predictions using it. We will see this in action in the upcoming section when we serve our model for making inferences.

Real-time Serving with FastAPI

Once our model is deployed (locally or on the cloud), we can use it to make offline/batch predictions, as seen previously. However, we can also serve it some platform to be used for making real-time predictions through HTTP requests. For this, we need to expose our model to the world through APIs (Application Program Interfaces). This is what we will address in the current section.

Serving Models via RESTful APIs

For those who are not familiar with APIs or RESTful API, you are highly encouraged to refer to the Introduction to RESTful APIs in the Additional Resources section to get a good understanding of it. However, just as a quick overview, a REST API transfers to the client the state of a requested resource. The requested resource in our scenario will be an inference from our ML model. As a result, our server will send predictions to a client, which may be anything — from a web app to a mobile device.

What is REST API?

Some of the advantages of using RESTful APIs to serve ML models are:

There are several tools & frameworks in Python (like FastAPI, Flask, Django, etc.) that can be used to create a backend server to load & serve our model for making predictions. We will proceed with FastAPI for this blog as it is extremely powerful, yet very simple to use & has pretty good documentation.

Brief Introduction to FastAPI

As mentioned on its website, “FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.6+ based on standard Python type hints.”

Image Source: FastAPI — GitHub

Since its inception rather recently, it has been appreciated & adopted widely (including companies like Uber & Netflix). It offers several cool features including the use of OpenAPI specification, creation of automatic & interactive documentation for APIs, security & authentication using various schemes, editor support (in VSCode & PyCharm) & numerous ‘plug-ins’.

For more information, feel free to browse through the FastAPI Documentation.

Installation

FastAPI can be installed easily using pip install fastapi[all] . Along with fastapi, this installs the starlette & pydantic libraries, which form the backbone of fastapi. Moreover, it also automatically installs uvicorn, which will at as the server that runs the fastapi code.

uvicorn is a lightweight ASGI server, & can be installed standalone as well using pip install uvicorn[standard]. Covering ASGI servers is beyond the scope of this post, but you can read about them in the Additional References section.

Creating a FastAPI Server for our ML Model

With this background about FastAPI, we are set to write the code that will load our deployed model & serve it as an API endpoint using FastAPI. Basically, we will spin up a server & offer the API endpoint /predict for users to upload any CSV file containing the data to make predictions on. Following is a rough outline of the stuff that we will try to accomplish:

All this in less than 50 lines of code! (excluding the comments in the file)

Before diving into the code, we just need to install a couple of packages:

# To receive uploaded files (uploaded files are sent as "form data")
$ pip install python-multipart

# To load environment variables from .env file into the application
$ pip install python-dotenv

In the folder that you wish to have your server in, create a .env file as follows:

AWS_ACCESS_KEY_ID={your-access-key}
AWS_SECRET_ACCESS_KEY={your-secret-access-key}
AWS_DEFAULT_REGION=ap-south-1

Note: Never upload your .env file to GitHub. Always add it to .gitignore & create an alternative .env.example file containing only the variable names (not their values) so that people using the code know what environment variables are required by the application.

# Import Uvicorn & the necessary modules from FastAPI
import uvicorn
from fastapi import FastAPI, File, UploadFile, HTTPException
# Import the PyCaret Regression module
import pycaret.regression as pycr
# Import other necessary packages
from dotenv import load_dotenv
import pandas as pd
import os
# Load the environment variables from the .env file into the application
load_dotenv()
# Initialize the FastAPI application
app = FastAPI()
# Create a class to store the deployed model & use it for prediction
class Model:
def __init__(self, modelname, bucketname):
"""
To initalize the model
modelname: Name of the model stored in the S3 bucket
bucketname: Name of the S3 bucket
"""

# Load the deployed model from Amazon S3
self.model = pycr.load_model(
modelname,
platform = 'aws',
authentication = { 'bucket' : bucketname }
)

def predict(self, data):
"""
To use the loaded model to make predictions on the data
data: Pandas DataFrame to perform predictions
"""

# Return the column containing the predictions
#
(i.e. 'Label') after converting it to a list
predictions = pycr.predict_model(self.model, data=data).Label.to_list()
return predictions
# Load the model that you had deployed earlier on S3.
# Enter your respective bucket name in place of 'mlopsdvc170100035'

model = Model("lightgbm_deploy_1", "mlopsdvc170100035")
# Create the POST endpoint with path '/predict'
@app.post("/predict")
async def create_upload_file(file: UploadFile = File(...)):
# Handle the file only if it is a CSV
if file.filename.endswith(".csv"):
# Create a temporary file with the same name as the uploaded
# CSV file to load the data into a pandas Dataframe

with open(file.filename, "wb")as f:
f.write(file.file.read())
data = pd.read_csv(file.filename)
os.remove(file.filename)
# Return a JSON object containing the model predictions
return {
"Labels": model.predict(data)
}
else:
# Raise a HTTP 400 Exception, indicating Bad Request
# (you can learn more about HTTP response status codes
here)
raise HTTPException(status_code=400, detail="Invalid file format. Only CSV Files accepted.")
# Check if the environment variables for AWS access are available.
# If not, exit the program

if os.getenv("AWS_ACCESS_KEY_ID") == None or os.getenv("AWS_SECRET_ACCESS_KEY") == None:
exit(1)

Now, we can simply run uvicorn main:app --host=0.0.0.0 --port=8000 & see that our server is up & running on http://0.0.0.0:8000 in no time at all. 0.0.0.0 indicates that it can be accessible by the loopback address 127.0.0.1, as well as through the IP address of the machine

To test our endpoint, we can go to the automatic interactive documentation available at http://127.0.0.1:8000/docs . You see that the POST /predict endpoint has been created, which can be tested by expanding the section & clicking on "Try it now". You can upload any CSV file (with the same columns as the training data) & send it to the server for returning real-time predictions:

You can create a test dataset for uploading by sampling some rows from the material_superconductivity.csv file & saving it as a separate file.

You can also use the curl command mentioned in the documentation above to programmatically query the http://127.0.0.1:8000/predict endpoint with your desired CSV file to obtain the predictions as a response.

With this, we can see how we have been easily able to deploy & serve our trained ML model for users to access it via a RESTful API.

Generic Template for Serving ML/DL Models

We have seen above how we can serve a PyCaret model deployed on AWS using FastAPI. This concept can be extended to serve any ML/DL model, deployed locally or on the cloud using FastAPI so that it can be available to the users via an API endpoint. The generic template for doing so is as follows:

1. Set up the FastAPI application
2. Load the model(s) into the application
3. Create required API endpoint(s) for users to submit data:
- These could be CSV file(s), image(s), JSON object(s), etc.
- Handle incoming data appropriately
4. Use the indended model to predict the result(s) on the data submitted
5. If successful, return the predictions, else raise an error

Using this generic template, one can deploy & serve models built using any framework (scikit-learn, PyTorch, TensorFlow, etc.) to the users (although other frameworks may also offer dedicated tools for serving models built using them).

Hosting the Model Server Online

In the above example, we started our FastAPI server locally & performed predictions. We can host this API server online as well using several options:

Interested folks are encouraged to try out some of these options.

Closing Remarks

Congratulations! You have made it to the end of this 4-blog series on Fundamentals of MLOps. Reflecting upon our learnings over these 4 posts, we got some hands-on experience with tools that can automate every stage of the ML workflow & make it more efficient:

Having completed the content for these 4 parts, now are in a good position to revisit the MLOps Stack Template introduced in Part 1 & fill it up using the tools & frameworks that we have learned to use as a part of this 4-blog series. Now, our MLOps stack should look like the one shown below.

Of course, there are a plethora of other tools that can be used at each step (as mentioned in Part 1). However, the frameworks introduced in this series also form a complete set & can help you get started in your MLOps journey!

Hope you found this Fundamentals of MLOps series interesting & useful. Following are the other blogs that are a part of this series:

Thank you & Happy Coding!

About the Author

Hey folks! I’m Tezan Sahu, a Data & Applied Scientist at Microsoft. I completed my B.Tech from IIT Bombay with a Major degree in Mechanical Engineering and a Minor degree in Computer Science & Engineering. I have a keen interest in NLP, Deep Learning & Blockchain, & love to develop interesting products using cutting-edge technologies.

Website: Tezan Sahu | Microsoft
LinkedIn: Tezan Sahu | LinkedIn
Email ID: tezansahu@gmail.com

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Tezan Sahu

Written by

Data & Applied Scientist at Microsoft | B. Tech in Mechanical Engineering (Minor in CS) from IIT Bombay | GSoC’20 with PEcAn Project

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com