Published in


Track human poses in real-time on Android with TensorFlow Lite

Posted by Eileen Mao and Tanjin Prity, Engineering Practicum Interns at Google, Summer 2019

We are excited to release a TensorFlow Lite sample application for human pose estimation on Android using the PoseNet model. PoseNet is a vision model that estimates the pose of a person in an image or video by detecting the positions of key body parts. As an example, the model can estimate the position of a person’s elbow and / or knee in an image. The pose estimation model does not identify who is in an image; only the positions of key body parts.

TensorFlow Lite is sharing an Android sample application that utilizes the device’s camera to detect and display key body parts of a single person in real-time. Check out the source code!

Why is this exciting?

This sample application will make it easier for app developers and machine learning experts to explore the possibilities of a light-weight mobile model.

The PoseNet sample application

PoseNet App workflow

The PoseNet library

The Person class contains the locations of the key body parts with their associated confidence scores. The confidence score of a person is the average of the confidence scores of each key point, which indicates the probability that a key point exists in that position.

Each KeyPoint holds information on the Position of a certain BodyPart and the confidence score of that key point. A list of all the defined key points can be accessed here.

The PoseNet sample app

The application performs the following steps for each incoming camera image:

  1. Capture the image data from camera preview and convert it from YUV_420_888 to ARGB_888 format.
  2. Create a Bitmap object to hold the pixels from the RGB format frame data. Crop and scale the Bitmap to the model input size so that it can be passed to the model.
  3. Call the estimateSinglePose() function from the PoseNet library to get the Person object.
  4. Scale the Bitmap back to the screen size. Draw the new Bitmap on a Canvas object.
  5. Use the position of key points obtained from the Person object to draw a skeleton on the canvas. Display the key points with a confidence score above a certain threshold, which by default is 0.5.

In order to synchronize pose rendering with the camera frame, a single SurfaceView was used for the output display instead of separate View instances for the pose and the camera. SurfaceView takes care of placing the surface on the screen without a delay by getting, locking, and painting on the View canvas.

Running on-device

On the roadmap

  1. Multi-pose estimation
  2. GPU acceleration with the GPU delegate
  3. NNAPI acceleration with the NNAPI delegate
  4. Post-training quantization of the model to decrease latency
  5. Additional model options, such as the ResNet PoseNet model

It was a pleasure developing the PoseNet sample app this summer! We hope this app makes on-device machine learning more accessible. If you use the app, please share it with us using #TFLite, #TensorFlow, and #PoweredByTF