PoseNet for iOS, Android and Flutter using TensorFlow Lite

Sha Qian
Flutter Community
Published in
4 min readJun 5, 2019

PoseNet is a well-known pose estimation model in TensorFlow.js. There’s a tflite version (multi_person_mobilenet_v1_075_float.tflite) of the model provided here on TensorFlow’s webpage but no official tutorial yet on how to use it.

I ported the code of PoseNet for TensorFlow.js to Android and iOS in the Flutter tflite plugin. In this post I will share the native code used to run the model, and the Flutter code to use the plugin.

Here’s a short demo of how PoseNet performs in a Flutter App on iOS.

Understanding the Outputs of Multi-Pose Estimation

Shape of the Outputs

I started with the blog post then explored the PoseNet tfjs source code. As I understand it, the outputs of multi-pose estimation should be 4 arrays of 4 dimensions like below:

  • Scores : [1] [height] [width] [Number of keypoints]
  • Offsets: [1] [height] [width] [Number of keypoints * 2]
  • Displacements(Forward): [1] [height] [width] [Number of edges * 2]
  • Displacements(Backward): [1] [height] [width] [Number of edges * 2]

The last dimension of the offsets and displacements arrays is the number of keypoints/edges multiplies 2. This is because half of the array are the x of the vectors and the other half are the y.

Scores array is the keypoint heatmap and offsets array is the offset vectors in the picture. The blog post explained them in details.

Displacements array (or displacement vectors) are used when we traverse along a part-based graph (edges) to locate a target keypoint from a known source keypoint. We start from finding a root keypoint which has the highest score in a local window on the heatmap then the root keypoint becomes our first known source keypoint.

Part-based Graph

PoseNet outputs 17 body parts and the parts are chained in a graph. When the source keypoint is a child node, we use the backward displacements array to locate the parent. When the source keypoint is a parent node, we use the forward displacements to locate the child.

With 17 parts, there should be 16 edges.

Calculation

  • To locate a target keypoint on heatmap from a source keypoint:
targetKeypointHeatmapPositions = sourceKeypointHeatmapPositions + displacementVectors
  • To find the actual position of a keypoint on an image:
keypointPositions = heatmapPositions * outputStride + offsetVectors

Problem with multi_person_mobilenet_v1_075_float.tflite

The outputs of multi_person_mobilenet_v1_075_float.tflite from TensorFlow’s webpage are:

  • Scores :[1][23][17][17]
  • Offsets: [1][23][17][34]
  • Displacements(Forward):[1][23][17][64]
  • Displacements(Backward):[1][23][17][1]

The shape of the two displacements arrays are different. It doesn’t seem right because with 16 edges, forward and backward displacements arrays should both have 32 values on the last dimension: x and y of 16 vectors . When I first experimented the model, I got index out of range exceptions when traversing backward from the root keypoint.

I found a StackOverflow thread discussing about this model file and others are having issues with it too. Big thanks to the answerer. His version of the converted tflite file generates expected output.

Code for Running PoseNet and Estimating Poses

The code is extracted from my Flutter project so you may need to make some adjustments before running it in a native Android/iOS project.

Loading Model in TensorFlow Lite:

  • Android:
  • iOS:

Feeding an Image to Input Tensor:

  • Android
  • iOS

Running Inference:

  • Android
  • iOS
TfLiteStatus status = interpreter->Invoke();if (status != kTfLiteOk) {
NSLog(@"Failed to invoke!");
return result(empty);
}

Decoding Poses:

  • Android
  • iOS

Output format:

x, y are between [0, 1]. You can scale x by the width and y by the height of the image.

[ // array of poses/persons
{ // pose #1
score: 0.6324902,
keypoints: {
0: {
x: 0.250,
y: 0.125,
part: nose,
score: 0.9971070
},
1: {
x: 0.230,
y: 0.105,
part: leftEye,
score: 0.9978438
}
......
}
},
{ // pose #2
score: 0.32534285,
keypoints: {
0: {
x: 0.402,
y: 0.538,
part: nose,
score: 0.8798978
},
1: {
x: 0.380,
y: 0.513,
part: leftEye,
score: 0.7090239
}
......
}
},
......
]

Using PoseNet in Flutter Tflite Plugin

Refer to the installation instructions if you are new to this plugin.

Loading the Model:

await Tflite.loadModel(model: “assets/posenet_mv1_075_float_from_checkpoints.tflite”);

Running Inference:

var result = await runPoseNetOnImage(
path: filepath, // required
imageMean: 125.0, // defaults to 125.0
imageStd: 125.0, // defaults to 125.0
numResults: 2, // defaults to 5
threshold: 0.7, // defaults to 0.5
nmsRadius: 10, // defaults to 20
asynch: true // defaults to true
);

Rendering the Keypoints:

Source Code

Flutter Tflite Plugin

The Android and iOS native code are used in this plugin:
https://github.com/shaqian/flutter_tflite

Real-time Pose Estimation

The app in the video demo is available here: https://github.com/shaqian/flutter_realtime_detection

--

--