Pose Estimation with TensorFlow Lite in Android
This article gives you an overview of implementing pose estimation with TensorFlow Lite in Android.
The below image shows pose estimation in action:
The popular paper DeepPose: Human Pose Estimation via Deep Neural Networks defines pose estimations as follows:
…the problem of localization of human joints.
We are trying to locate the position of a person’s joins in space. There is plenty of research in this area using older techniques to make estimations, but these often have limitations. More recent research has started looking at using Deep Neural Networks (DNN) and this appears to be achieving superior results. One such project, PoseEstimationForMobile, is using convolutional neural networks model. Its implementation is based on two papers:
- Stacked Hourglass Networks for Human Pose Estimation
- Multi-Context Attention for Human Pose Estimation
There are also 2D/3D Pose Estimation, but for this article I will focus only on 2D pose estimation.
TensorFlow Lite is defined as this.
TensorFlow Lite is an open source deep learning framework for on-device inference.
TensorFlow offers multiple levels of abstraction so you can choose the right one for your needs. Build and train models by using the high-level Keras API, which makes getting started with TensorFlow and machine learning easy.
TensorFlow Lite provides tf.Keras, which is TensorFlow’s implementation of the Keras API specification.
So you can convert tf.Keras to TensorFlow Lite easily.
If you want to know how to test inference of image data model, you can use some script.
The process in Android is as follows.
There are some steps to achieve inference in mobile.
Let’s go into more.
The Steps of Pose Estimation Inference
In PoseEstimationForMobile sample is trying to infer for each frame, which means there could be a heavy burden on mobile.
However, there are just simple 4 steps.
- Get a Bitmap from TextureView
- Give the binary data to TensorFlow Lite Interpreter
- Infer and get the output
- Render output data into TextureView
Get a Bitmap from TextureView
This project is using TextureView, which internally has bitmap so that you can easily get the data converted to byte data.
You need to capture scenes with Camera.
In Android, Camera2 is an option to take captions, and PoseEstimationForMobile is just using Official Camera2 Sample.
Recently, you can also use CameraX as an another option. (Minimum supported API level is 21.)
Also, you need to grant permission to use camera. see here.
In this implementation, using different thread for periodic inference and try to improve performance.
Periodic execution details are below.
You need to adjust which bitmap size you should give to the TensorFlow Lite model(tflite model).
It depends on your model, but generally bigger size would make the result more accurate. At the expense, more memories are eaten.
Give the binary data to TensorFlow Lite Interpreter
After getting bitmap data from TextureView, you need to convert bitmap data to byte data by yourself because current TensorFlow Lite SDK doesn’t support Platform non-primitive class (De)Serialization mechanism.
Before doing convert, let’s instantiate the Image Classifier.
- imageSizeX, imageSizeY
These are input image size in other words, which means you can arrange.
- outputW, outputH
These are literally output data size.
As you already know in step1, TextureView’s bitmap is ARGB8888, which means the channel has 4 bytes.
The relative-path to the tflite model from assets directory.
You need enough memory for image data converted from Bitmap. Just allocate first.
How to refer tflite model are below in Android.
Bitmap data is not the format that TensorFlow Lite can interpret, so we need to get color information from bitmap(#getPixels)
After getting color information from Bitmap, you need to convert those int values(array) into floating points.
These are just getting one color Blue or Green or Red.
When it comes to Blue data, just mask(and 0xFF),
to Green data, shift 8 bits right and mask(and 0xFF),
to Red data, shift 16 bits right and mask(and 0xFF).
This is just floating points, so if you want to normalize them, just do like below.
This normalization range is from -1.0 to 1.0.
so you’re finally ready to feed TensorFlow Lite interpreter.
Infer and get the output
inference itself is really simple because all the transactions are executed in SDK.
We just call like
tflite?.run(imgData!!, heatMapArray) .
As you see, imgData is color data from bitmap.
heatMapArray is the output(result) of inference.
What kind of dataset you can get depends what kind of tflite model you will use.
In this PoseEstimationForMobile, the process are below.
- allocate output size memory
- filter image data with OpenCV(using Gaussian Bluer)
- returning the position where there would be key points(joints, eyes) at highest possibility
allocate output size memory
In this case, key points are 14 points(knee, shoulder..) and each has x-axis, and y-axis.
filter image data with OpenCV(using Gaussian Bluer)&returning the position where there would be key points(joints, eyes) at highest possibility
The image filtering process is just using OpenCV so I don’t explain that here.
There are 14 key points for human body in this model, and after filtering image data, you can get the results for each point as coordinates. (mPrintPointArray)
Render output data into TextureView
The argument ratio is the ratio between input image and output image
It’s really easy because you already know the ratio when you instantiate Image Classifier.
Input image size was (192, 192) and output image size was (96, 96), so you can just adjust it to your coordinates.
mRatioX is the ratio between X and Y coordinates.
Finally you can get all the key points coordinates!!!
- Basics mechanism of Pose Estimation inference
- How to use TensorFlow Lite with DNN models for Android
-  Alexander Toshev, Christian Szegedy, IEEE Conference on Computer Vision and Pattern Recognition, 2014, “DeepPose: Human Pose Estimation via Deep Neural Networks”
-  Alejandro Newell, Kaiyu Yang, and Jia Deng, https://arxiv.org/abs/1603.06937, 2016, “Stacked Hourglass Networks for Human Pose Estimation”
-  Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, Xiaogang Wang, https://arxiv.org/abs/1702.07432, 2017,“Multi-Context Attention for Human Pose Estimation”