Serving Human Pose Estimation Model on Multi-Model-Server

Published in

The Startup

10 min readSep 2, 2020

Today, training and running ML models on a local or remote machine is not a big deal as there are different frameworks and tools with lots of tutorials that enable you to get some decent results. However, when it comes to serving or deploying models at scale, it may seem a reasonable problem for most newcomers when they start to think about scalability, availability, and reliability of their models in production.

Multi-Model-Server (addressed as MMS hereafter), a tool created by AWS Labs, makes model serving easy and flexible for most ML/DL frameworks. It helps to create an HTTP service for handling your inference requests which could be a good start to have a microservice while building a bigger platform around it.

If you look at the examples directory of the MMS source repository, you will see several ready implementation examples of different CNN and RNN architectures for Computer Vision and NLP tasks. However, I could not find usable architecture for Human Pose estimation, an important Computer Vision problem, which I needed to use for my own project. What I knew from my experience was that most of the Human Pose estimation models were based on multi-stage processing including Region Proposals and Pose estimation modules and thus, it would require multi-step inferencing which could be beyond the scope of MMS examples. So, I thought why not create my own MMS implementation for that task?

In this post, we will walk through the steps of serving the Human Pose estimation model (AlphaPose) on MMS.

Prerequisite knowledge for better understanding

Python
MXNet
GluonCV
OpenCV
Docker

For the simplicity of this post to make it really easy taking off for beginners, I am using ready and easier solutions. Definitely, you can try out other tools and frameworks which could outweigh the current method. But, just to keep you in the track of other possible solutions I will count several of them. Training and prediction of Human-Pose model: OpenPose using Caffe, OpenPose using Tensorflow, AlphaPose using Pytorch, SimplePose using Pytorch, SimplePose using Tensorflow, and so on. Deployment and Serving of model: NVIDIA Triton Server, Pytorch Serve, Tensorflow Serving, AWS SageMaker, Google Cloud AI Platform. However, you may be wondering why I am not advising more custom and simpler web frameworks like Flask, FastAPI, Tornado for wrapping models as API service. The reason is that they may bring you more headaches in terms of scalability, maintainability, availability in the long run if you are not a senior architect to design the whole system. But, that is solely my opinion and I may be wrong :)

So, let’s start the implementation.

Checking Demo

First, we will start by checking the pretrained demo model example given in GluonCV whether we can run it simply in Google Colab.

Checking Demo Code

Final Result after execution in Google Colab

As we run it in Google Colab, we see that demo has been successfully tested and ready for serving.

You see how easy it was to test the pretrained model :)

Preparing MMS Service

Now, we need to write our compatible service with MMS based on the above demo. So we look at Custom Service implementation from MMS docs:

The above code template may seem scary for someone, but actually it is not and it is pretty straightforward as it is the base template for most of the ML inference use-cases. Let me explain:

class ModelHandler

initializes () — the setup stage, where you initialize and load your model weights.
preprocess() — the first stage where the request is received and should be prepared for inference with some preprocessing.
inference() — the inference stage where the real prediction comes in and it predicts the preprocessed data (in our case it predicts human joint coordinates from a picture)
postprocess() — the last stage where predicted output may be wrapped into some human-readable format and returned as a response
handle() — a controller of a service that controls the flow of the inference from request to response generation

As you are guessing now, we need to divide our example codebase from GluonCV into these 5 steps. We start by copying the whole MMS Custom Service example as handler.py and will modify it as shown below:

By the way, the ready repository of this post can be found here.

1. initialize() — step

handler.py

We put the loading and initialization of models from the GluonCV example into our initialize() method. As I have stated at the beginning about human-pose models, this implementation also uses multi-stage processing that consists of object-detector and human-pose estimator models. For object-detector we use yolov3 and for the human-pose estimator, we use AlphaPose. So we get pretrained ones from GluonCV Model Zoo and initialize them in this step. Then in the last line, we reset our object-detector model for detecting only humans as our main purpose of this service is to find only human joint points.

2. preprocess() — step

This is how could it look if we have just copy-pasted the code from the GluonCV example. But as we are receiving a request with bytes representing a picture in our MMS, this implementation is wrong and we have to slightly modify the code:

And this is how our final implementation of the preprocess() method should look like. First, we decode the image bytes from the request (in our case request is defined as a batch) to mxnet.NDArray which is supported multi-dimensional array format for MXNet and GluonCV. Second, we transform our image into yolov3 readable format. Basically, we do everything exactly wrong implementation wanted to do except downloading and reading the image.

3. inference() — step

We put the rest part of our GluonCV example to the inference() and then return the prediction results as the output of the function.

4. postprocess() — step

This is finally how we postprocess our inference_output and give it as a response from our MMS.

We won’t modify the rest part of the handler.py as it already does its job.

Preparing Infrastructure

Now our service implementation is ready and it is time to prepare the infrastructure to deploy to. We will use Docker for ease of configuration and not waste our precious time in tinkering and tweaking the infrastructure.

Let’s dive into Dockerfile where we define our infrastructure as a code for MMS:

Dockerfile

We use awsdeeplearningteam/multi-model-server as our base image. We know that for our example we fully depend on GluonCV and some OpenCV functions, so we install GluonCV and OpenCV (upgrading the other 2 libraries for the sake of package compatibility requirements).

After finishing our Dockerfile, we move on the MMS configuration file for several adjustments:

config.properties

This configuration file is downloaded from the MMS repository and changes that are made are as follows:

preload=true — Helps when you want to spin up several workers and you do not want for each worker to load your model separately, so a model is loaded before spinning up the model workers, and forked for any number of model workers.
default_workers_per_model=1— number of workers to create for each model that loaded at startup time. This is to make sure that only one worker is created per model and not to waste on resource allocation.

Other important configuration decisions can be made by looking at the Advanced configuration of MMS.

Now, we will run several commands, so before you start make sure that your project structure is the same as here and run all of these commands from the project root directory.

We build our image for the MMS:

$ docker build -t human-pose-mms .

We enter into the bash of the MMS to run one script to prepare our deployment model:

$ docker run --rm -it -v $(pwd):/tmp human-pose-mms:latest bash

We run our script to prepare our deployment model:

$ model-archiver --model-name pose --model-path /tmp/service --handler handler:handle --runtime python3 --export-path /tmp

We exit from the container by pressing CTRL-D.

Then you will see a new pose.mar file created in your project root directory. We will use this file while starting our server.

This .mar file is the model format that the MMS reads and understands to handle inference requests. To learn more about .mar extension and model-archiver read here.

Running/Testing Server

Finally, after endless, long, and boring explanations we are here to run and test our server. Hurrah!!!

Let’s start the server by running our ready container through terminal:

$ docker run --rm -it -p 8080:8080 -p 8081:8081 -v $(pwd):/tmp mms-human-pose:latest multi-model-server --start --mms-config /tmp/config.properties --models posenet=pose.mar --model-store /tmp

This will start the MMS listening on 8080 and 8081 HTTP ports.

Let’s test it now.

By the way, when you run the container, you have to wait from several seconds to several minutes(depending on your internet speed and computer power) for MMS to initialize properly. This is needed because MMS downloads 2 models that we have stated above and it allocates memory for those models so model workers can handle them. We could have created persistence storage for models not to download them every time, but this would be out of the scope of this post.

Then, we will open a terminal and send a request with the example image:

$ curl -X POST http://localhost:8080/predictions/posenet/ -T example.jpg

We will get this kind of response:

Yes, we see that our inference server is now working and ready to make predictions. Here in the generated response, we see 17 joint coordinates(based on AlphaPose) of the person in the picture and their corresponding confidence scores. If you want to learn more about the AlphaPose model, please visit the official AlphaPose implementation repository.

Before using these prediction results you should keep in mind that our object detector which is yolov3 receives frames where height and width should be factors of 32. So every time before inferencing our picture, the picture is resized and these results that you see are not proportional to ground truth dimensions of the picture. You need to look into the method transform_test() that is used in the preprocess() step to learn about its resizing method.

Bonus

But, for the sake of completeness of this post, let’s make something tangible. We will implement a simple client in python that works with our human-pose MMS service.

The following steps are needed to implement a client:

Rewriting GluonCV function transform_test() of yolov3 with OpenCV
Rewriting GluonCV function plot_keypoints() with OpenCV
Writing HTTP client for request handling.

Let’s start it. By the way, you can check out the ready client here.

Firstly, if we look inside the functiontransform_test() of GluonCV, it resizes, tensorifies, and normalizes the image for yolov3 input. But for our case, we just need to resize the image because as I stated above, we may get wrong coordinates in response from our MMS service if we do not resize the image in our client implementation.

This is simply the conversion of transform_test() where we only need resizing. But you can look at the original implementation here.

Secondly, if we look inside the function plot_keypoints() , we see arguments like class_ids, bounding_boxes, scores which are part of object-detector and we actually do not need them. Also, this function uses matplotlib, but in our case OpenCV so we also remove matplotlib usage.

Here is how we converted GluonCV implementation into OpenCV implementation. But you can look at the original implementation here.

Lastly, we implement a client that handles requests and shows the result.

Except using OpenCV, we also use Typer for faster CLI implementations.

Basically, that’s what we need for our simple client and let’s run it. Run the below script from the project root directory. By the way, do not forget to install these python packages with pip:

$ python client/cli.py

This is what you get when you run the above command. Only colors may be different in your case, but the basic joints structure should be similar.

# project repository
https://github.com/bedilbek/human-pose-mms

Note

My apologies if my codebase seems error-prone because it is error-prone. The reason is that I wanted to keep all of this explanation too simple as this post was for beginners entering into the tiniest part of ML infrastructure stuff. If I had written a more robust codebase, it would have been the longest post you have ever read in Medium(it is long though). That is why I highly not recommend using this example in your production environment!!! We are not taking into consideration batch configuration, strict input validation, error handling, context-based dynamic MMS setup, resource allocation, docker configuration, model storage persistence, and other miscellaneous but important stuff that needs to be done by MLOps, ML Infrastructure Engineer, or in general Software Engineer.

Conclusion

What I really wanted to show here is that thinking about ML infrastructure should be one of your important priorities if you do not want to end up just using some frameworks/libraries in jupyter notebook and want to go beyond training/testing your model. And this is one of the real working ways of converting your jupyter notebook codebase into a servable and deployable environment with MMS.