We are excited to release a TensorFlow Lite sample application for human pose estimation on Android using the PoseNet model. PoseNet is a vision model that estimates the pose of a person in an image or video by detecting the positions of key body parts. As an example, the model can estimate the position of a person’s elbow and / or knee in an image. The pose estimation model does not identify who is in an image; only the positions of key body parts.
TensorFlow Lite is sharing an Android sample application that utilizes the device’s camera to detect and display key body parts of a single person in real-time. Check out the source code!
Why is this exciting?
There are many possibilities with pose estimation. To name a few, developers can augment reality based on images of the body, animate computer graphic characters, and analyze the gait of athletes in sports. At Google I/O’19, TensorFlow Lite showcased Dance Like, an app that helps users learn how to dance using the PoseNet model.
This sample application will make it easier for app developers and machine learning experts to explore the possibilities of a light-weight mobile model.
The PoseNet sample application
In contrast with the existing Android examples that are written in Java, the PoseNet sample app was developed in Kotlin. The goal of developing the app was to make it easy for anyone to use the PoseNet model with minimal overhead. The sample app includes a PoseNet library that abstracts away the complexities of the model. The diagram below shows the workflow between the application, PoseNet library, and TensorFlow Lite library.
The PoseNet library
The PoseNet library provides an interface that takes a processed camera image and returns information about where the person’s key body parts are. This functionality is provided by
estimateSinglePose(), a method that runs the TensorFlow Lite interpreter on a processed RGB bitmap and returns a
Person object. This page explains how to interpret PoseNet’s inputs and outputs.
Person class contains the locations of the key body parts with their associated confidence scores. The confidence score of a person is the average of the confidence scores of each key point, which indicates the probability that a key point exists in that position.
KeyPoint holds information on the
Position of a certain
BodyPart and the confidence score of that key point. A list of all the defined key points can be accessed here.
The PoseNet sample app
The PoseNet sample app is an on-device camera app that captures frames from the camera and overlays the key points on the images in real-time.
The application performs the following steps for each incoming camera image:
- Capture the image data from camera preview and convert it from
- Create a
Bitmapobject to hold the pixels from the RGB format frame data. Crop and scale the
Bitmapto the model input size so that it can be passed to the model.
- Call the
estimateSinglePose()function from the PoseNet library to get the Person object.
- Scale the
Bitmapback to the screen size. Draw the new Bitmap on a
- Use the position of key points obtained from the
Personobject to draw a skeleton on the canvas. Display the key points with a confidence score above a certain threshold, which by default is 0.5.
In order to synchronize pose rendering with the camera frame, a single
SurfaceView was used for the output display instead of separate
View instances for the pose and the camera.
SurfaceView takes care of placing the surface on the screen without a delay by getting, locking, and painting on the
On the roadmap
In the future, we hope to explore more features for this sample app, including:
- Multi-pose estimation
- GPU acceleration with the GPU delegate
- NNAPI acceleration with the NNAPI delegate
- Post-training quantization of the model to decrease latency
- Additional model options, such as the ResNet PoseNet model
It was a pleasure developing the PoseNet sample app this summer! We hope this app makes on-device machine learning more accessible. If you use the app, please share it with us using #TFLite, #TensorFlow, and #PoweredByTF
Special thanks to Nupur Garg and Pulkit Bhuwalka, our hosts and Tensorflow Lite software engineers, Tyler Zhu, creator of the PoseNet Model, Pavel Senchanka, fellow intern, Clément Julliard, Pixel Camera software engineer, and the TensorFlow Lite team.