Facenet on Mobile — Part 2
Converting Facenet (.pb) to Facenet (.tflite)
Help out the underdog! Beat the Cybercrime learn how.
If you have not read my story about FaceNet Architecture, i would recommend going through part-1. In the next part-3, i will compare .pb
and .tflite
models.
When state-of-art accuracy is required for face recognition / authentication, Facenet is obvious choice for both Android and IOS platforms. But running Facenet on mobile devices needs some special treatment, this article addresses problem and the potential solution.
On Android, every application has limited memory usage forced by Dalvik VM. Android NDK is not restricted with this limitation but that does not mean native code can consume as much memory as they want. The same logic is applicable to AI models, they SHOULD be lean and lite.
AI models are large data files that are structured in a specific file format so that the inference engine can quickly parse and load them. Reading such data files can cause various problems: parsing overhead, degraded performance due to size, long load time, and may potentially cause memory throttling on low end devices.
Tensorflow’s .tflite
file format, that is essentially made to solve such problems on resource constrained devices. .tflite
actually transforms the full blown model data into something called quantized and lite structured format (i.e. flatbuffers).
The non-quantized Facenet model size is around 95MB, moreover it is in protocol-buffers (another type of file format). Protocol-buffers are astonishing slow compared to flatbuffers, following graph shows the comparison:
If you would like to know supported operations with .tflite
. You can also check all the latest breaking changes directly from the codebase.
Anyway, let us start paying with the code …
Step 1: Clone Facenet repository
$ git clone https://github.com/davidsandberg/facenet.git
We will quantise pre-trained Facenet model having 512 embedding size, but you can choice to use the model with 128 size embeddings. This model is about 95MB in size before quantization.
wget
the model to ../facenet/models/
directory
$ ls -l models/total 461248
-rw-rw-r — @ 1 milinddeore staff 95745767 Apr 9 2018 20180402–114759.pb
Install tensorflow on your machine:
$ pip3 install tensorflow
Make sure you have python ≥ 3.4 version installed.
Step 2: Striping training branch
Here we striped out ‘phase_train’ input branch, this will reduce the total count of operations and make it inference only graph.
If you try to convert facenet model .pb
to .tflite
you will get BatchNorm error as:
ERROR: ‘mean, multiplier and offset needs to be constant’.
let us understand this in detail:
What is BatchNormalization?
BatchNormalization (a.k.a BatchNorm): Input data preprocessing is MUST, before feeding into the neural network, this is usually done on the input dataset.
And similarly, we also normalize the data between DNN layers, this is called BatchNorm. It transforms the data on the same scale and avoids any instability. This has two parts to it:
- Normalization: Typically we down scale the data on the scale of 0 to 1. Let us assume i have data_values = [1, 200, 100000, 500]. Thats quite a wide range of values, we need to bring them down on to the scale of 0 to 1 with lots of floating points in between. This is important else the network will become unstable with wide range of data points and training will take long long time, also it is not guaranteed that the network will converge properly.
- Standardization: This actually calculates Z-score and makes the dataset have ‘zero’ mean and standard deviation of ‘one’.
x — m
Z = -----
S
Where:
x = data point
m = Mean of dataset
S = Standard deviation
Now we need to understand how BatchNorm is implemented in Tensorflow. It works differently for ‘TRAINING’ vs ‘INFERENCE’
Training:
- Based on mini-batch statistics, normalize layer activations. Where mini-batch statistics are: mini-batch mean and mini-batch standard deviation.
- Update population statistics approximation via moving average of mini-batch statistics. Where population statistics are: population mean and population standard deviation.
Inference:
- Estimate population statistics, to normalize layer activations.
- Do NOT update population statistics according to mini-batch statistics, because this is test data.
In the code: we need to set the is_training = False
so that BatchNorm work as suggest for Inference mode. The below snippet set arg_scope
so that for allBatchNorm
in the graph, this configuration is applicable.
inception_resnet_v1.inference(data_input, keep_probability=0.8, phase_train=False, bottleneck_layer_size=512)Here:
is_training = phase_train = False
To create an inference only model paste the below code in a file inference_graph.py
under directory ../facenet/
import tensorflow as tf
from src.models import inception_resnet_v1
import sys
def main():
traning_checkpoint = “models/model-20180402-114759.ckpt-275”
eval_checkpoint = “model_inference/imagenet_facenet.ckpt”
data_input = tf.placeholder(name=’input’, dtype=tf.float32, shape=[None, 160, 160, 3])
output, _ = inception_resnet_v1.inference(data_input, keep_probability=0.8, phase_train=False, bottleneck_layer_size=512)
label_batch= tf.identity(output, name=’label_batch’)
embeddings = tf.identity(output, name=’embeddings’)
init = tf.global_variables_initializer() with tf.Session() as sess:
sess.run(init)
saver = tf.train.Saver()
saver.restore(sess, traning_checkpoint)
save_path = saver.save(sess, eval_checkpoint)
print(“Model saved in file: %s” % save_path)
if __name__ == “__main__”:
main()
Run the script and create inference only model, save it under ../facenet/model_inference/
$ python3 inference_graph.py models/ model_inference/
Facenet provides freeze_graph.py
file, which we will use to freeze the inference model.
$ cd ./facenet
$ python3 src/freeze_graph.py model_inference/ facenet_frozen.pb
Once the frozen model is generated, time to convert it to .tflite
$ tflite_convert --output_file model_mobile/my_facenet.tflite --graph_def_file facenet_frozen.pb --input_arrays “input” --input_shapes “1,160,160,3” --output_arrays "embeddings" --output_format TFLITE — mean_values 128 --std_dev_values 128 --default_ranges_min 0 --default_ranges_max 6 --inference_type QUANTIZED_UINT8 --inference_input_type QUANTIZED_UINT8
We have qunatized float32 to quint8 and that brings down the size three folds. Let us check the quantized model size:
$ ls -l model_mobile/total 47232
-rw-r — r — @ 1 milinddeore staff 23667888 Feb 25 13:39 my_facenet.tflite
To verify the .tflite
model we need interpeter code as:
import numpy as np
import tensorflow as tf# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=”/Users/milinddeore/facenet/model_mobile/my_facenet.tflite”)interpreter.allocate_tensors()# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()# Test model on random input data.
input_shape = input_details[0][‘shape’]
input_data = np.array(np.random.random_sample(input_shape), dtype=np.uint8)interpreter.set_tensor(input_details[0][‘index’], input_data)
interpreter.invoke()output_data = interpreter.get_tensor(output_details[0][‘index’])print(‘INPUTS: ‘)
print(input_details)
print(‘OUTPUTS: ‘)
print(output_details)
Interpeter output:
$ python inout.pyINPUTS:
[{‘index’: 451, ‘shape’: array([ 1, 160, 160, 3], dtype=int32), ‘quantization’: (0.0078125, 128L), ‘name’: ‘input’, ‘dtype’: <type ‘numpy.uint8’>}]
OUTPUTS:
[{‘index’: 450, ‘shape’: array([ 1, 512], dtype=int32), ‘quantization’: (0.0235294122248888, 0L), ‘name’: ‘embeddings’, ‘dtype’: <type ‘numpy.uint8’>}]
The output is as we expected, now time to run it on target device and see the performance.
In the next part-3, i will go little deep an compare .pb
and .tflite
model comparision without quantization. Theoretically they should spit out same embeddings.
See you on part-3 …