[Tutorial]: Serve an ML Model in Production using FastAPI

Ashmi Banerjee

5 min readJun 27, 2022

A step-by-step tutorial to serve a (pre-trained) image classifier model from TensorFlow Hub using FastAPI.

FastAPI is a popular, high-performance Python backend framework used in web development.

In this tutorial, I will show you how to serve an image classifier model using Python and FastAPI.

The high-level architecture of the app can be summarised as follows:

We have a front end (out of our current scope) which takes in an image URL as input.
The front end makes API calls to the FastAPI backend.
The FastAPI backend does the computation and sends back the predicted image along with the probability.
The front end displays the results from the backend.

The architecture of our image classifier app

Step: 0. Prerequisites

Make sure you’re using Python 3.6+
Create a virtual environment and activate it
virtualenv venv source venv/bin/activate
Install dependencies
— Create the requirements.txt file as following
— Now install the requirements as pip3 install -r requirements.txt

fastapi~=0.75.0
uvicorn==0.17.6
numpy==1.22.4
pydantic==1.9.1
Pillow==9.1.1
tensorflow

4. Create the project structure as follows

fastapi-backend
├── src
│   ├── app
│   │   ├── app.py
│   └── pred
│   │   ├── models
│   │   │   ├── tf_pred.py
│   │   └── image_classifier.py
│   └── utils
│   │   ├── utilities.py
│   └── main.py
└── requirements.txt

Step: 1. Create API endpoint(s)

In the app.py file, implement the /predict/tf/ end-point.

First, after importing the required packages, in line 5, we initialise the FastAPI app with the name of our API (here referred to as the Image Classifier API, but you can address it by any name) as its title.

Since we are sending data to the backend, we implement the end-point as a POST request in line 10.

But before we implement this end-point, in line 7, we need to define the data model as a class that inherits from theBaseModel we imported from Pydantic in line 3.
Our data model just has one attribute: img_url which is a str storing the URL of the image used for classification.

The function predict_tftakes in input a request (of datatype Img, the data model created above) and returns a JSON containing the status code (HTTP code 200 in case of a valid prediction), the predicted label, and the prediction probability.

It then calls the run_classifier function where all the image classification happens.

In case of a null prediction, it raises an HTTPException with the status code 404, hinting that there is some problem with the image.

Step: 2. Implement prediction algorithm

Next, we need to implement our image classification algorithm, which should classify the image into its right class.

However, since getting the best model for prediction is not the focus of the tutorial, I have used a pre-trained TensorFlow MobileNet_V2 model from the TensorFlow hub here.

I will just highlight the outline of the steps here.

Load the image
Run predictor on the loaded image
Return the results

A more detailed implementation can be accessed from the GitHub repository here.

Step: 3. Create main.py

So far we have implemented our image classification algorithm and its respective endpoint.

The next step would be to implement the main.py file so that we can run the server and interact with our end-point directly from the browser.

For this, we will be using uvicorn server, which is an ASGI web server implementation for Python.

Step: 4. View on http://127.0.0.1:8000/docs/

Voila! If you’ve successfully reached here, you should have your image classifier API up and running on http://127.0.0.1:8000/docs/ and should have a similar-looking page!

Next Steps

Testing

Once you have built the APIs, the next step would be to test your endpoints. Thoroughly testing the end-points come with the following benefits:

Fewer bugs
Smooth deployments
Writing good code
Test-driven development

A follow-up tutorial on load testing our endpoints using the open-source load testing tool Locust can be found here.

[Tutorial]: Performance Test ML Serving APIs using Locust and FastAPI

A step-by-step tutorial to use Locust to load test a (pre-trained) image classifier model served using FastAPI.

medium.com

Containerisation

Applications running in containers can be deployed easily to multiple different operating systems and hardware platforms.

They offer the following advantages:

Performance consistency
DevOps teams know applications in containers will run the same, regardless of where they are deployed.
Greater efficiency
Containers allow applications to be more rapidly deployed, patched, or scaled.
Less overhead
Containers require fewer system resources than traditional or hardware virtual machine environments because they don’t include operating system images.

A detailed tutorial on containerisation of your FastAPI application using docker has been published here.

[Tutorial]: Serve a Containerised ML Model using FastAPI and Docker

A step-by-step tutorial to serve a containerised Machine Learning (ML) model using FastAPI and docker.

medium.com

Deployment

Once the APIs have been thoroughly tested and containerised, the next step is to deploy them to some cloud service so that they can be publicly accessible.

A multitude of options is available for this purpose.

✨ The source code on GitHub can be accessed here.
The references and further readings on this topic have been summarised here.

✨ If you like the article, please subscribe to get my latest ones.
To get in touch, either reach out to me on LinkedIn or via ashmibanerjee.com.

[Tutorial]: Serve an ML Model in Production using FastAPI

Step: 0. Prerequisites

Step: 1. Create API endpoint(s)

Step: 2. Implement prediction algorithm

Step: 3. Create main.py

Step: 4. View on http://127.0.0.1:8000/docs/

Next Steps

Testing

[Tutorial]: Performance Test ML Serving APIs using Locust and FastAPI

A step-by-step tutorial to use Locust to load test a (pre-trained) image classifier model served using FastAPI.

Containerisation

[Tutorial]: Serve a Containerised ML Model using FastAPI and Docker

A step-by-step tutorial to serve a containerised Machine Learning (ML) model using FastAPI and docker.

Deployment

Written by Ashmi Banerjee