Fundamentals of MLOps — Part 4 | Tracking with MLFlow & Deployment with FastAPI

Published in

Analytics Vidhya

17 min readSep 5, 2021

Over the last 3 posts, we developed a firm understanding of how to version ML artifacts & build ML Pipelines with great ease & sophistication. By now, you should be reasonably familiar with tracking your data using DVC & experimenting with various ML models for regression & classification tasks by leveraging the numerous pre-processing, feature engineering & fine-tuning techniques in PyCaret.

In this final blog of the series, we will address the topics involving the logging of experimentation results effectively, deployment of ML models & finally serving them as an API endpoint for users to actually interact with the model to get desired predictions. So, let’s dive right in!

Experiment Tracking: Why does it Matter?
MLFlow for Tracking PyCaret Experiments
Model Deployment: What is it?
Deploying a PyCaret Model to the Cloud
Real-time Serving with FastAPI
Hosting the Model Server Online
Closing Remarks

Experiment Tracking: Why does it Matter?

We have seen that developing ML models for any system is a highly iterative process, involving a lot of experimentation. These experiments could typically differ in:

Preprocessing steps applied to the data
Usage of different model algorithms
Usage of different sets of hyperparameters
Training and/or evaluation sets

Each of these experiments needs to be evaluated as well with appropriate metrics to come up with certain conclusions, which assist us in either designing further experiments or making a definitive inference. Thus, it is of utmost importance that we maintain a record, or track our experiments carefully.

***Image Source:*** *Machine Learning Model Management*

Experiment tracking is a part of MLOps that refers to the practice of saving all experiment-related information (the metadata, that will help in drawing meaningful conclusions) for each experiment performed while designing an ML Pipeline.

Experiment tracking helps us in accomplishing the following:

Compare results across experiments: Logging the hyperparameters & performance metrics of models across experiments, along with tagging them with a few consistent tags can keep things extremely organized as one tries to do a comparative analysis of the experiments
Manage distributed training: Maintaining a record of the various runs across different machines while training a model on a distributed system can later help aggregate & visualize the results more effectively
Manage team collaboration through reports: With several members in a team working on different approaches to solving a problem, building reports periodically about the current approach, its performance & potential variations that can be tried next, can prove to be crucial for effective collaboration within the team
Maintain a record of models: With an increasing number of models being deployed into production, it becomes imperative to record every model that is developed (along with the metadata) so that any failures in downstream tasks can be easily traced back to the origin & rectified soon

MLFlow for Tracking PyCaret Experiments

There are several tools & platforms that help in ML experiment tracking. These include MLFlow, Weights & Biases, Neptune, TensorBoard, etc. We will try to scratch the surface of one of these — MLFlow Tracking (typically because it offers great flexibility in tracking experiments & is also integrated with PyCaret)

MLFlow Fundamentals

MLFlow is a library-agnostic open-source tool that offers various solutions to manage end-to-end ML workflows:

MLFlow Tracking (to track experiments & compare their results)
MLFlow Projects (to package ML pipeline code into a reusable & reproducible form to be shared with others)
MLFlow Models (to deploy models from various ML libraries to various serving platforms)
MLFlow Model Registry (a central store to perform model versioning & manage model lifecycles)

***Image Source:*** *Databricks Open Sources MLflow to Simplify Machine Learning Lifecycle*

In this post, we will look at MLFlow Tracking. You are free to explore & use the other components as deemed necessary.

The MLflow Tracking component provides an API and UI for logging parameters, code versions, metrics, and output files while executing ML code during experiments & viewing the results.

Setting up MLFlow Tracking for PyCaret Experiments

The functionality to track experiments using MLFlow has been embedded into PyCaret 2.0 (the mlflow package is installed automatically while installing pycaret) & can be set up with a few simple steps:

Step 1: Set up the tracking server (the location — local or remote — where the logs & tracked data will be stored) by importing mlflow & calling the set_tracking_uri(...) function with the address of the tracking server as follows:

import mlflow

# To use a local folder to store the logs, prefic the full path with 'file:/' & use it
mlflow.set_tracking_uri('file:/<full-path-to-local-folder>')

# To use a remote storage location, provide the HTTP URI, for example:
mlflow.set_tracking_uri('<https://my-tracking-server:5000>')

If not set explicitly, a folder named mlruns will be created in the current directory & logs will be stored there.

Step 2: Start logging the experiments by using the following additional parameters while calling the setup(...) function:

log experiment = True (allows logging of all parameters & metrics on the MLFlow server)
experiment_name = <name_of_experiment> (set the name of the experiment to be logged; if not set, 'clf' is the name by default)
log_plots = True (Optional: allows specific plots to be logged as .png file)
log_data = True (Optional: allows logging of training & test datasets used in the experiment)

Now we can proceed as usual with PyCaret to select & train our desired models, tune the hyperparameters & evaluate them through the metrics & plots. All the metadata related to the experiment will automatically get logged to our tracking server, which can be viewed later using the MLFlow UI.

Navigating through the MLFlow UI

Having completed our experiment(s), we can proceed to view the logs in the MLFlow UI. This UI enables visualization, searching & comparison between experiments, along with the provision to download the various ML artifacts (models, data & metadata) for further analysis.

If the tracking server is a local folder, the UI can be started by going to the folder & running mlflow ui (from the terminal) or !mlflow ui (from notebook). If not tracking server has been set explicitly, the logs are recorded in mlruns/ folder within the current directory. Thus, the command mlflow ui can be run from the current directory. The UI can be viewed by going to http://localhost:5000.
If using a remote-tracking server, the UI can be viewed simply by going to http://ip-address-of-tracking-server:5000.

The following is a screenshot of the UI for a PyCaret experiment tracked using MLFlow. As you can see, the execution of each model in every step of the workflow has been recorded as a run, with the details of the metrics, run execution time, metrics & also the source function.

Apart from the final model being used (Light GBM here), you can also find a clickable link to all the other models that were created during the compare_models() function call. Clicking on the respective models takes you to the page that displays all the details about the model including parameters, metrics, tags & also the plots & other downloadable artifacts (if created during the experiment). It also contains code snippets that can be used to load the saved model using mlflow & make predictions. You can also "Register" your models to the MLFlow Model Registry if needed.

An interesting aspect of tracking using MLFlow is that it allows you to download & use those models as well which were not considered as the experiment progressed (like the ones not selected after running compare_models()) as they can be useful for other related analyses as well.

MLFlow (Beyond PyCaret)

As we have seen above, MLFlow Tracking is already embedded into PyCaret & can be set up in just a couple of simple steps. However, MLFlow offers much more flexibility as well & can be easily integrated into several other ML & DL frameworks including Scikit-Learn, Tensorflow, Keras, PyTorch, FastAI, etc for automatic logging purposes. You can refer to the documentation to understand how it can be used with these frameworks.

Model Deployment: What is it?

Having understood the various aspects of effectively experimenting with our ML models, we now come to the final stage of our pipeline — deploying our trained models to production & making inferences using them.

Model deployment can be considered as the process of exposing a trained ML model in a production environment to be used by the rest of the world for making inferences.

To get the most out of ML models, they must be smoothly deployed into production so that businesses can begin using them in practice.

There are several ways to deploying & serving ML models into production, with advantages varying depending on the individual use case:

Offline Predictions

Usually done on a local machine when predictions are for a single event & are generated directly from python code
Eg: Predictions on a test dataset during hackathons (or usual course assignments)

Batch Prediction

Set of predictions done from a file/datastore as input on a rather periodic basis
Eg: A model being used to predict the set of products to be recommended to users through weekly emails based on their activity patterns on an e-commerce website

Real-time Serving

Making on-demand predictions usually through HTTP calls to a model served in the cloud (low latency expected)
Eg: An object detection model served on a website where a user uploads an image & submits it to the server for detection of objects in the image. The results from the model are returned almost immediately on the website.

Streaming

Similar to batch or real-time, but with an additional queue to handle the high volume & variability of incoming prediction requests, to be served at the processing rate instead of the arrival rate
Eg: On online fraud detection model where transactions are queued up & asynchronously processed to verify if the incident was fraudulent

Different tools cater to each of these approaches. In this post, we will look into how our ML models can be deployed to the cloud & served for making real-time predictions.

Deploying a PyCaret Model to the Cloud

In the previous blog, we used PyCaret to develop a complete ML pipeline. We became familiar with using the save_model(...), load_model(...)& predict_model(...). Now we will see the steps to finally deploy our trained & fine-tuned model to the could & use it later for making inferences.

For this exercise, I assume that you have completed the tutorial in Part 3 of this series & have your blended model for predicting the critical temperature of superconducting materials.

Model Finalization

While setting up an experiment in PyCaret, we split the entire training dataset into training & validation sets. During the pipeline, we train our model(s) on this training set & later evaluate it on the held-out validation set to verify our performance metrics. But once verified that our chosen model is performing as expected on the held-out validation set, we may want to train our model one final time on the whole dataset (training + validation). This can be accomplished using finalize_model(...).

To use this, you will have to run the setup(...) function for the Intermediate PyCaret section, load your saved blended model using blended_model = load_model("blended_expt2") & then train it on the entire training set using final_blended = finalize_model(blended_model)

With this step in place, we can now proceed to deploy our model to the cloud.

Model Deployment on AWS

Although a trained model can be deployed locally as a .pkl file using the save_model(...) function we saw in the previous post, we can alternatively deploy our models to the cloud. PyCaret allows users to deploy their models to a variety of cloud platforms like AWS, Azure & GCP. In this blog, we will see how to deploy our trained model to AWS since we have created our free-tier AWS account & also have set up our S3 bucket in Part 2 while learning to use DVC.

Although the necessary library, i.e. boto3 would have automatically been installed when you installed dvc[s3] in Part 2, but if not done, you can install it now using pip install boto3. The necessary environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY & AWS_DEFAULT_REGION should be configured as well (ignore if already done while following Part 2).

Now, we can simply deploy the model to our S3 bucket using:

# Deploy model
deploy_model(
    model = final_blended, 
    model_name = 'lightgbm_deploy_1', 
    platform = 'aws', 
    authentication = {'bucket' : 'mlopsdvc170100035'}
)# Enter your respective bucket name in place of 'mlopsdvc170100035'

Making Inferences using a Deployed Model

Now, just as we used load_model(...) & predict_model(...) to make inferences using locally saved models (seen in Part 3), we can use the combination to make inferences using our model deployed on AWS:

loaded_model = load_model(
    'lightgbm_deploy_1', 
    platform = 'aws', 
    authentication = { 'bucket' : 'mlopsdvc170100035' }
)predictions = predict_model(loaded_model, data=data_unseen)predictions.head()    # View some of the predictions

Now, we know that we can use the above snippet anywhere in our application to load the deployed model & make batch predictions using it. We will see this in action in the upcoming section when we serve our model for making inferences.

Real-time Serving with FastAPI

Once our model is deployed (locally or on the cloud), we can use it to make offline/batch predictions, as seen previously. However, we can also serve it some platform to be used for making real-time predictions through HTTP requests. For this, we need to expose our model to the world through APIs (Application Program Interfaces). This is what we will address in the current section.

Serving Models via RESTful APIs

For those who are not familiar with APIs or RESTful API, you are highly encouraged to refer to the Introduction to RESTful APIs in the Additional Resources section to get a good understanding of it. However, just as a quick overview, a REST API transfers to the client the state of a requested resource. The requested resource in our scenario will be an inference from our ML model. As a result, our server will send predictions to a client, which may be anything — from a web app to a mobile device.

Some of the advantages of using RESTful APIs to serve ML models are:

To increase the number of clients, serve predictions on the fly
Decouple the model environment & client-facing layer to facilitate teams to work independently of each other
Combining several models at various API endpoints is a possibility
Easily scale the application by putting extra instances behind a load balancer

There are several tools & frameworks in Python (like FastAPI, Flask, Django, etc.) that can be used to create a backend server to load & serve our model for making predictions. We will proceed with FastAPI for this blog as it is extremely powerful, yet very simple to use & has pretty good documentation.

Brief Introduction to FastAPI

As mentioned on its website, “FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.6+ based on standard Python type hints.”

Since its inception rather recently, it has been appreciated & adopted widely (including companies like Uber & Netflix). It offers several cool features including the use of OpenAPI specification, creation of automatic & interactive documentation for APIs, security & authentication using various schemes, editor support (in VSCode & PyCharm) & numerous ‘plug-ins’.

For more information, feel free to browse through the FastAPI Documentation.

Installation

FastAPI can be installed easily using pip install fastapi[all] . Along with fastapi, this installs the starlette & pydantic libraries, which form the backbone of fastapi. Moreover, it also automatically installs uvicorn, which will at as the server that runs the fastapi code.

uvicorn is a lightweight ASGI server, & can be installed standalone as well using pip install uvicorn[standard]. Covering ASGI servers is beyond the scope of this post, but you can read about them in the Additional References section.

Creating a FastAPI Server for our ML Model

With this background about FastAPI, we are set to write the code that will load our deployed model & serve it as an API endpoint using FastAPI. Basically, we will spin up a server & offer the API endpoint /predict for users to upload any CSV file containing the data to make predictions on. Following is a rough outline of the stuff that we will try to accomplish:

Set up the FastAPI application
Load the deployed model into the application
Create a POST endpoint /predict to accept an incoming CSV file & convert it into a pandas DataFrame
Use the model to make predictions on the DataFrame & return the results to the user

All this in less than 50 lines of code! (excluding the comments in the file)

Before diving into the code, we just need to install a couple of packages:

# To receive uploaded files (uploaded files are sent as "form data")
$ pip install python-multipart   
 
# To load environment variables from .env file into the application
$ pip install python-dotenv

In the folder that you wish to have your server in, create a .env file as follows:

AWS_ACCESS_KEY_ID={your-access-key}
AWS_SECRET_ACCESS_KEY={your-secret-access-key}
AWS_DEFAULT_REGION=ap-south-1

Note: Never upload your .env file to GitHub. Always add it to .gitignore & create an alternative .env.example file containing only the variable names (not their values) so that people using the code know what environment variables are required by the application.

# Import Uvicorn & the necessary modules from FastAPI
import uvicorn
from fastapi import FastAPI, File, UploadFile, HTTPException# Import the PyCaret Regression module
import pycaret.regression as pycr# Import other necessary packages
from dotenv import load_dotenv
import pandas as pd
import os# Load the environment variables from the .env file into the application
load_dotenv() # Initialize the FastAPI application
app = FastAPI()# Create a class to store the deployed model & use it for prediction
class Model:
    def __init__(self, modelname, bucketname):
        """
        To initalize the model
        modelname: Name of the model stored in the S3 bucket
        bucketname: Name of the S3 bucket
        """
        # Load the deployed model from Amazon S3
        self.model = pycr.load_model(
            modelname, 
            platform = 'aws', 
            authentication = { 'bucket' : bucketname }
        )
    
    def predict(self, data):
        """
        To use the loaded model to make predictions on the data
        data: Pandas DataFrame to perform predictions
        """
        # Return the column containing the predictions  
        # (i.e. 'Label') after converting it to a list
        predictions = pycr.predict_model(self.model, data=data).Label.to_list()
        return predictions# Load the model that you had deployed earlier on S3. 
# Enter your respective bucket name in place of 'mlopsdvc170100035'
model = Model("lightgbm_deploy_1", "mlopsdvc170100035")# Create the POST endpoint with path '/predict'
@app.post("/predict")
async def create_upload_file(file: UploadFile = File(...)):
    # Handle the file only if it is a CSV
    if file.filename.endswith(".csv"):
        # Create a temporary file with the same name as the uploaded 
        # CSV file to load the data into a pandas Dataframe
        with open(file.filename, "wb")as f:
            f.write(file.file.read())
        data = pd.read_csv(file.filename)
        os.remove(file.filename)        # Return a JSON object containing the model predictions
        return {
            "Labels": model.predict(data)
        }    
    else:
        # Raise a HTTP 400 Exception, indicating Bad Request 
        # (you can learn more about HTTP response status codes here)
        raise HTTPException(status_code=400, detail="Invalid file format. Only CSV Files accepted.")# Check if the environment variables for AWS access are available. 
# If not, exit the program
if os.getenv("AWS_ACCESS_KEY_ID") == None or os.getenv("AWS_SECRET_ACCESS_KEY") == None:
    exit(1)

Now, we can simply run uvicorn main:app --host=0.0.0.0 --port=8000 & see that our server is up & running on http://0.0.0.0:8000 in no time at all. 0.0.0.0 indicates that it can be accessible by the loopback address 127.0.0.1, as well as through the IP address of the machine

To test our endpoint, we can go to the automatic interactive documentation available at http://127.0.0.1:8000/docs . You see that the POST /predict endpoint has been created, which can be tested by expanding the section & clicking on "Try it now". You can upload any CSV file (with the same columns as the training data) & send it to the server for returning real-time predictions:

You can create a test dataset for uploading by sampling some rows from the material_superconductivity.csv file & saving it as a separate file.

You can also use the curl command mentioned in the documentation above to programmatically query the http://127.0.0.1:8000/predict endpoint with your desired CSV file to obtain the predictions as a response.

With this, we can see how we have been easily able to deploy & serve our trained ML model for users to access it via a RESTful API.

Generic Template for Serving ML/DL Models

We have seen above how we can serve a PyCaret model deployed on AWS using FastAPI. This concept can be extended to serve any ML/DL model, deployed locally or on the cloud using FastAPI so that it can be available to the users via an API endpoint. The generic template for doing so is as follows:

1. Set up the FastAPI application
2. Load the model(s) into the application
3. Create required API endpoint(s) for users to submit data:
   - These could be CSV file(s), image(s), JSON object(s), etc.
   - Handle incoming data appropriately
4. Use the indended model to predict the result(s) on the data submitted
5. If successful, return the predictions, else raise an error

Using this generic template, one can deploy & serve models built using any framework (scikit-learn, PyTorch, TensorFlow, etc.) to the users (although other frameworks may also offer dedicated tools for serving models built using them).

Hosting the Model Server Online

In the above example, we started our FastAPI server locally & performed predictions. We can host this API server online as well using several options:

Interested folks are encouraged to try out some of these options.

Closing Remarks

Congratulations! You have made it to the end of this 4-blog series on Fundamentals of MLOps. Reflecting upon our learnings over these 4 posts, we got some hands-on experience with tools that can automate every stage of the ML workflow & make it more efficient:

We kicked off Part 1 by trying to understand the key principles & practices of MLOps
In Part 2, we started getting our hands dirty by first learning about versioning for ML projects & then used DVC to version & maintain ML artifacts in Amazon S3
In Part 3, we explored how to develop end-to-end ML Pipelines efficiently with a low-code ML framework called PyCaret
We completed this final Part 4 by learning about Model Tracking using MLFlow & also saw how to deploy our models to S3 & serve them using FastAPI

Having completed the content for these 4 parts, now are in a good position to revisit the MLOps Stack Template introduced in Part 1 & fill it up using the tools & frameworks that we have learned to use as a part of this 4-blog series. Now, our MLOps stack should look like the one shown below.

Of course, there are a plethora of other tools that can be used at each step (as mentioned in Part 1). However, the frameworks introduced in this series also form a complete set & can help you get started in your MLOps journey!

Hope you found this Fundamentals of MLOps series interesting & useful. Following are the other blogs that are a part of this series:

Thank you & Happy Coding!

Additional References

If you enjoyed this article, I’m certain that you’d love my brand-new FREE AI Products & Research newsletter, “The Vision, Debugged”.

The Vision Debugged

Decoding Disruptive Tech! 👨‍💻 Exclusive, cutting-edge tech & AI product insights, delivered to your inbox every week

the-vision-debugged.beehiiv.com

Subscribe & join the bandwagon of enthusiastic readers across top companies like Microsoft, Google, Walmart, Deloitte & more to get cool AI products & research insights, cheat sheets & resources.

About the Author

Hey folks!

I’m Tezan Sahu, an Applied Scientist at Microsoft, an Amazon #1 Bestselling Author (for the book “Beyond Code: A Practical Guide for Data Scientists, Analysts & Engineers”), and co-author of “The Vision, Debugged” newsletter.

I am passionate about helping aspiring data scientists & software developers kickstart their careers, deliver consistent impact & become differentiated professionals in the field of AI & Data Science.

If you are interested in learning more about how you can leverage AI to stay ahead of the curve and boost your results, connect with me on LinkedIn & subscribe to my newsletter.