Productizing an ML Model with FastAPI and Cloud Run (Part 1)

From building an API to serve the model to containerizing and deploying it as a service.

Miguel Ángel Cárdenas
Semantix
5 min readJun 23, 2021

--

Photo by SpaceX on Unsplash

In this post you will find out how to:

  • Configure a database communication with Orator
  • Create simple API routes using FastAPI
  • Wrap up an ML package using docker
  • Deploy it as a service using Cloud Run

One the neatest things about Machine Learning projects is to be able to serve them as a service whether on-premise or cloud platforms. In this story, I’m going to cover a walkthrough of the steps to create a backend code for serving a simple computer vision model. I’ll be using for this purpose, FastAPI for our model API, GCP SDK for blob storage interaction, and Orator ORM to work with a Postgres database.
As for part 2, I’ll cover how to create a simple frontend service using Streamlit for interacting with the computer vision application’s API.
To demonstrate this framework, I’m going to use an Xception network pre-trained on the ImageNet dataset with Keras deep learning library. The following image shows how the final application will look like and how the prediction pipeline will be executed.

The image below shows the backend and frontend architecture. From the backend perspective, we’ll need to create an API to manage RESTful requests from the frontend service to perform a few tasks:

  • Upload target image to cloud storage
  • Download target image to the backend service container
  • Perform predict routine

Additionally, to keep track of the prediction requests, I’ll be inserting them on a database table as part of the monitoring process.

Project structure

Let us start by creating a project structure that comprises our API’s routes folder, our database migrations, and models path, and all related files to the Docker containerization. For this app, I’ll be using Pipenv for the python packaging tool where dependencies are managed with requirements.txt file.

Database setup

First, let’s set up our database connection using an ORM(Object-relational mapping) which allows us to use non-SQL code, in an object-oriented fashion, to interact with the logging table. In this case, I’ve chosen Orator, which provides simple active record implementation for working with our Postgres database. Therefore, our logging table has a corresponding model that’s used to interact with the prediction record table.

Start by creating a database config file and make sure you update the value for the env variables.

db.py

Then, generate a template file for the logging model table and fill it out with the data fields that you want to record. For this little project I’ll be using file to store image name , class for top class probability score and probability score itself. But first create migration file:

logger.py

And finally apply the migrations:

Model Setup

For the sake of the experiment create a simple classifier backed up by Xception network pre-trained on ImageNet dataset over 1.000 classes. For that purpose, I’ve created a couple of helper functions and the main predict function to instance the main call. This module will be packaged by python’s setup tools (setup.py)as with namespace src acting as stand-alone library.

classifier.py

API creation

This main snippet will drive out our app’s API. The main.py script (depicted below )has 3 routes:
- landingpage: api’s homepage
- predict: responsible for carrying out inference
- files: responsible for dealing with upload and download files from/out the storage bucket

Our predict module receives a request that contains both size and image name from the frontend counterpart, and right after running the classifier it writes down a logger table instance and serializes its output as an endpoint response.

In that sense, we could expect something like the following JSON schema:

Containerizing

Now that we are all set, the next step is to deploy the containerized application as a service. To do that let’s wrap it up using a Dockerfile by creating an instance from the latest version of Google python’s app engine and copying all the necessary files from our project structure inside of it. Right after installing all the package manager dependencies point to the proper entry point shell script that runs our API.

Deploying

When comes to deploy a containerized application on a fully managed serverless platform there is an affordable option (free up to 2M request/month) called Cloud Run. This means that it scales on-demand almost in an abstract fashion, which is good for the ones like me, who are starting in the software development path. Cloud Run is one of the compute engine options that Google Cloud Platform offers to productize your apps.
The usual workflow consists of two parts: first submitting(and versioning) your app’s container, and finally deploy it to the platform. The following shell script shows how to do it.

crun_deply.sh

Finally run the Cloud Run deploy script using your IAM credentials.

Once your service is up, navigate through /docs endpoint to explore the swagger UI documentation. This tool generates interactive API documentation that lets you try out the API calls directly in the web browser.

And that’s it for this part. In the next one, we’ll create a frontend interface for using the implemented services as an API.
Happy codding!

--

--