Google Edge TPU Coral with a Keras custom model (All you need to know, a real deploy case)

5 min readJun 12, 2020

This is my first post blog ever, so be gentle with my mistakes here, I only want to help people like me who bought a Google Coral with all the hope in the hearth and when it arrives saw that the paradise don’t looks as cool as in the Google twits or tutorials.

The documentation in Coral is not really a lot, not really straight to the point, and the worst of all, 99% of it focused on already done models like PoseNet, ImageNet, MobileNet, etc.. ¿What if you need to use Coral with YOUR custom-made model? That’s what I will try to solve here.

Just to be clear, Coral is AWESOME, it can run huge models in a little Raspberry pi with a 3–7x performance, depends on the optimization you do, but you need to finish a long-distance race in order to fit your model inside that tiny piece of IT magic hardware.

So.. let’s go. This is a large process, I will try to not create boilerplate and go straight to the points.

In this point, you created your model (Not focusing on using it on a Coral device) and it works fine in your 2080TI or Google Colab Notebook, but when it comes to inference in a Raspebrry Pi looks like it takes a lifetime to finish.

Before buy a Coral to solve that you should know a cascade of things that can change your mind:

You are trying to load a model (prepared to inreference in a super GPU) in a finger-sized device so.. don’t wait for back magic here. It will improve but will not run at Google Colab speed. ****link to performance tests
Not all models and operations can run on an Edge TPU.
Why not? Cause the model should be compiled to run in an Edge TPU device
In order to compile it the model should be utra-quantized
Floats are forbidden here

So it is good for you to try to quantize your model and test if it fits in Coral before buy one. (You do not need to fully train it, only check the arquitecture)

So.. Floats are forbidden here. That means I should recreate my model in order to load in a Coral and substitute every node for integers? Not at all.. Google creates a process called “post-quantize” to change your nodes and operations in order to work with INT8. That means am I going to lose precision? YES definitely, but despicable in the majority of occasions, don’t worry.

Coral is only allowed to work with INT8, even in the input and quantizing a model is the process to translate all FLOAT operations in INT. This is how you can quantize your model once it is already done instead of directly work with integers:

https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf1.ipynb#scrollTo=ZmiHICezwXZq

Just to be fair, not all operations are allowed yet. I had to change a LEAKY_RELU for a simple RELU cause Leaky are not into the allowed list yet, thanks god it didn’t change the model behaviour. Also my model have two inputs instead of one and you need to provide a real dataset piece to the process in order to the algorithm to know how to transform your floating into intergers calculating a mean and a deviation.

Ok I got my quantized model, can I use it now in my Google Coral device? Not yet. Your Google Coral device loads the model with an special parameter of tf.lite.Interpreter which looks like this:

https://coral.ai/docs/edgetpu/tflite-python/#load-tensorflow-lite-and-run-an-inference

That’s it.. no more.. no less. Once it is done is really simple, but we are not here yet :(

Ok we got a TFLITE quantized model, that can run in Tensorflow Lite far more quickly than your original model (basically cause it works only with integers), but if you run this in a Tensorflow interpreter like the last image, with a Coral TPU connected, your performance will be EXACTLY the same. That was the first try I did when I got my Coral and it blow up my mind. Why is that? Because you need to compile your model in order to use the Coral operations and device otherwise all that operations will continue to run in CPU by default. We will do that this way:

https://coral.ai/docs/edgetpu/compiler/#usage

Here.. if everything goes right, you will have your “*_edgetpu.tflite” compiled model which now it is ready to load in yout Coral with the Interpreter code I showed you. You will also have a LOG file were you can check what goes right and what wrong and you can try to optimize your code to compile even better. The more operations runs inside Coral and less in the outside CPU, the quicker the model will infer.

Once you got your Quantized model (compiled for edget tpu or not) you should know that your model is not your original model anymore. It changed a lot, your float inputs are not floats anymore but.. the good point is, in my case I did not need to worry about this, in the quatization model there are saved information enough to let the interpreter made the changes of inputs and outputs without a single line of code. So for me this part was not necessary:

I repeat, this was not necessary for me, but I saw ths code in lot of places so maybe it is a good script for your case.

Then you are ready to follow the official guidelines, need to install it:

https://coral.ai/docs/accelerator/get-started/#on-linux

Just one aclaration here, Google says that “libedgetpu1-max” and “libedgetpu1-std” are the same but with more or less power, in my use case “libedgetpu1-max” gives me an error and “libedgetpu1-std” works perfectly:

transfer on tag 2 failed. Abort. Not found: USB transfer error 5 [LibUsbDataOutCallback]

If you got this error just use the standart library.

And then.. Hooray! All is done! Just use it!!

Hint here: The Coral white light will blink when it is inferencing.
Hint 2: TFLite and tflite_runtime only works nowadays if you install trough the Wheel link in this page:
https://www.tensorflow.org/lite/guide/python

My Google Colab Quantizing Tests: https://colab.research.google.com/drive/18_f5w8x2AJLWDWnDoj8PVdqcI-4C2bkp

Google Retrain Soruce: https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf1.ipynb#scrollTo=ZmiHICezwXZq

All this work is done to make this App work:
https://github.com/ianholing/faceEditor_flutter
With this server:
https://github.com/ianholing/faceEditor_server

I will explain how this App works in future posts.

Google Edge TPU Coral with a Keras custom model (All you need to know, a real deploy case)

Written by Santi Iglesias