Facenet on Mobile — Part 2

Converting Facenet (.pb) to Facenet (.tflite)

Milind Deore
Analytics Vidhya
5 min readFeb 28, 2019

--

Help out the underdog! Beat the Cybercrime learn how.

If you have not read my story about FaceNet Architecture, i would recommend going through part-1. In the next part-3, i will compare .pb and .tflite models.

When state-of-art accuracy is required for face recognition / authentication, Facenet is obvious choice for both Android and IOS platforms. But running Facenet on mobile devices needs some special treatment, this article addresses problem and the potential solution.

On Android, every application has limited memory usage forced by Dalvik VM. Android NDK is not restricted with this limitation but that does not mean native code can consume as much memory as they want. The same logic is applicable to AI models, they SHOULD be lean and lite.

AI models are large data files that are structured in a specific file format so that the inference engine can quickly parse and load them. Reading such data files can cause various problems: parsing overhead, degraded performance due to size, long load time, and may potentially cause memory throttling on low end devices.

Tensorflow’s .tflite file format, that is essentially made to solve such problems on resource constrained devices. .tflite actually transforms the full blown model data into something called quantized and lite structured format (i.e. flatbuffers).

The non-quantized Facenet model size is around 95MB, moreover it is in protocol-buffers (another type of file format). Protocol-buffers are astonishing slow compared to flatbuffers, following graph shows the comparison:

If you would like to know supported operations with .tflite. You can also check all the latest breaking changes directly from the codebase.

Anyway, let us start paying with the code …

Step 1: Clone Facenet repository

We will quantise pre-trained Facenet model having 512 embedding size, but you can choice to use the model with 128 size embeddings. This model is about 95MB in size before quantization.

wget the model to ../facenet/models/ directory

Install tensorflow on your machine:

Make sure you have python ≥ 3.4 version installed.

Step 2: Striping training branch

Here we striped out ‘phase_train’ input branch, this will reduce the total count of operations and make it inference only graph.

If you try to convert facenet model .pbto .tflite you will get BatchNorm error as:

ERROR: ‘mean, multiplier and offset needs to be constant’.

let us understand this in detail:

What is BatchNormalization?

BatchNormalization (a.k.a BatchNorm): Input data preprocessing is MUST, before feeding into the neural network, this is usually done on the input dataset.

And similarly, we also normalize the data between DNN layers, this is called BatchNorm. It transforms the data on the same scale and avoids any instability. This has two parts to it:

  1. Normalization: Typically we down scale the data on the scale of 0 to 1. Let us assume i have data_values = [1, 200, 100000, 500]. Thats quite a wide range of values, we need to bring them down on to the scale of 0 to 1 with lots of floating points in between. This is important else the network will become unstable with wide range of data points and training will take long long time, also it is not guaranteed that the network will converge properly.
  2. Standardization: This actually calculates Z-score and makes the dataset have ‘zero’ mean and standard deviation of ‘one’.

Now we need to understand how BatchNorm is implemented in Tensorflow. It works differently for ‘TRAINING’ vs ‘INFERENCE’

Training:

  1. Based on mini-batch statistics, normalize layer activations. Where mini-batch statistics are: mini-batch mean and mini-batch standard deviation.
  2. Update population statistics approximation via moving average of mini-batch statistics. Where population statistics are: population mean and population standard deviation.

Inference:

  1. Estimate population statistics, to normalize layer activations.
  2. Do NOT update population statistics according to mini-batch statistics, because this is test data.

In the code: we need to set the is_training = False so that BatchNorm work as suggest for Inference mode. The below snippet set arg_scope so that for allBatchNorm in the graph, this configuration is applicable.

To create an inference only model paste the below code in a file inference_graph.py under directory ../facenet/

Run the script and create inference only model, save it under ../facenet/model_inference/

Facenet provides freeze_graph.py file, which we will use to freeze the inference model.

Once the frozen model is generated, time to convert it to .tflite

We have qunatized float32 to quint8 and that brings down the size three folds. Let us check the quantized model size:

To verify the .tflite model we need interpeter code as:

Interpeter output:

The output is as we expected, now time to run it on target device and see the performance.

In the next part-3, i will go little deep an compare .pb and .tflite model comparision without quantization. Theoretically they should spit out same embeddings.

See you on part-3

You can connect me on | LinkedIn | Website | Github |

--

--