[Tutorial]: Serve an ML Model as REST API using TensorFlow Serving

Ashmi Banerjee
6 min readAug 2, 2022

--

A step-by-step tutorial to serve a (pre-trained) image classifier model from TensorFlow Hub using TensorFlow Serving and REST APIs.

Our tech stack for the tutorial

In my previous tutorials, I highlighted serving ML models using FastAPI.

In this tutorial, we will learn how to serve the same image classifier model through Python, Docker, and TensorFlow Serving.

To quote the official TensorFlow Serving source:

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments.

The following tutorial has been prepared using the TensorFlow Serving REST-ful API.

TensorFlow Serving Architecture

TensorFlow Serving: REST Architecture

The TensorFlow Serving REST Architecture comprises of the following stages:

  • The raw data is first extracted, processed, and cleaned to be able to be trained to build TensorFlow models.
  • These models are now saved as different versions in a model repository.
  • A specified version of the model is now served through Docker and TensorFlow serving.
  • The client makes API calls to the model serving infrastructure and receives the model prediction as output.

However, since the focus of the tutorial is on serving a model, hence we are skipping the first step of model training and directly starting with model saving and serving.

Here, we will use a pre-trained TensorFlow Imagenet (ILSVRC-2012-CLS) classification with EfficientNet V2 model from the TensorFlow hub for our image classification.

Step: 0. Prerequisites

  1. Make sure you’re using Python 3.x
  2. Make sure you have docker installed. If not you can do it from here.
  3. Follow a similar project structure as shown in the code repository.
  4. Create a virtual environment and activate it
    virtualenv venv
    source venv/bin/activate
  5. Install dependencies
    — Create the requirements.txt file as following
    — Now install the requirements as pip3 install -r requirements.txt
Pillow==9.1.1
tensorflow
tensorflow_hub
requests
numpy

6. Install TensorFlow Serving using docker

The first step to getting started with TensorFlow Serving is to pull the latest docker image from the command line as follows:

docker pull tensorflow/serving

To see if you have the right image, you can check from the terminal as docker images and it should generate the following:

List of docker images

Step 1: (Load and) Save (pre-trained) Model

In the load_model() we load the pre-trained TensorFlow ImageNet model from the TensorFlow hub.

In the save_model(), we save the model under the /models/ folder so that later our API can access it.

Step 2: Serve Model using TF Serving

Step 3: Run TF Serving through REST API

The next step is to run the container from the pulled image.

docker run --name myTFServing 
-it
-v path/to/repository/img-classifier-tfx:/img-classifier-tfx
-p 9001:9001 --entrypoint /bin/bash tensorflow/serving

Explanations:

  • docker run --name <myContainerName> : creates a container named myTFServing (you can use any name here). The name of the container is later used for identification purposes.
  • -it : runs the container in interactive mode
  • -v path/to/repository/img-classifier-tfx:/img-classifier-tfx binds a volume from the host to a directory inside the container.
  • -p <host_port>:<container_port> maps the port of the host to a port of the container.
  • --entrypoint /bin/bash : specifies that /bin/bash executable should run when the container is started.
  • tensorflow/serving is the name of the image from which the container is derived.

Step 4: Serve the model (using REST end-points)

Once we have the container up and running, our next goal is to serve the model as REST endpoints.

From inside the running docker container, we start the TensorFlow Serving as follows:

cd img-classifier-tfx/

tensorflow_model_server 
--rest_api_port=9001
--model_name=<model_name>
--model_base_path=/img-classifier-tfx/src/pred/path/to/models/

Note that we are using the same port 9001 that we had mapped earlier.
The parameters --model_name and --model_base_path denote the name under which the model has been saved and the model location respectively.

Step 5: Check if the model is working

You can check whether your model end-points are working using the following curl statements:

GET Request

curl -X GET http:/localhost:9001/v1/models/<model_name>

Your browser athttp:/localhost:9001/v1/models/<model_name> should also return the following if your model could be successfully accessed through the API.

{"model_version_status": [
{
"version": "1657646009",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}}]}

POST Request

curl -X POST http:/localhost:9001/v1/models/<model_name>:predict
-H “Content-Type: application/json”
-d @data.json

To check whether the model prediction works as expected we can send a POST request to our API. Here, we dump the data that needed to be sent as input to our end-point in a JSON file called data.json and pass it with our curl statement.

As a result, we should get a list of the predictions which need to be decoded to get the predicted class name and the prediction confidence.

Voila! If you’ve successfully reached here, you can already celebrate your first ML model serving using TF Serving! 😉

You can also define a main function, where you call the serve_rest() with an image URL as its parameter and get your predicted results !!

Our input/output

Handy Debugging Tips while using TF Serving with REST APIs

  • It is highly recommended to use TF Serving via Docker containers, owing to their easy integration abilities.
  • Even though TF Serving provides out-of-the-box integration with TensorFlow models, it is often not too straightforward to extend it to non-TensorFlow models.
  • If you’re working with larger datasets during inference, using a gRPC as your end-point could possibly be more efficient than its REST counterpart.
  • In case you run into path errors while loading models during the docker run stage, try replacing the paths with absolute path instead of a relative one.
  • In case the specified port (9001 in this case) may be unavailable or in use by other system processes, it can be easily changed to another port when running the Docker image.
  • Sometimes, if you try re-running the docker container, you could get container already in use error. In that case, terminate the container, remove it by docker rm <containerName> and re-run your container.

Conclusion

Even though TensorFlow Serving is a great tool to serve your machine learning models via a REST API, the dependency on TensorFlow models makes it a bit less flexible.

According to the TensorFlow Serving website,

TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

However, if you’re looking for something robust and reliable, do give this tool a shot!

The source code on GitHub can be accessed here.
The references and further readings on this topic have been summarised here.

If you like the article, please subscribe to get my latest ones.
To get in touch, either reach out to me on
LinkedIn or via ashmibanerjee.com.

--

--

Ashmi Banerjee

👩‍💻 Woman in tech, excited about new technical challenges. You can read more about me at: https://ashmibanerjee.com/