How vFlat used the TFLite GPU delegate for real time inference to scan books
A guest post by Kunwoo Park, Moogung Kim, Eunsung Han
Although there are many mobile scanning apps available for download, most focus on digitizing flat documents and struggle when it comes to scanning the curved pages of a book. When scanning a curved surface with text, a person may either do a destructive scan by removing the bindings of the book or publication, or simply accept the input as is and deal with reading the curved images captured by the camera.
That’s why at VoyagerX we developed vFlat, an Android app that uses deep learning to solve this issue. The vFlat app aims to let users easily scan their books without having to worry about curved pages. It also tries to reduce manual input from users by automatically determining the boundaries of a book page.
This comes in handy when a user tries to extract text from a photo of a book page via OCR (Optical Character Recognition). When extracting text from the ‘left’ image above, OCR fails to correctly recognize some words and text lines due to the fact that they are too curved. However, applying this same technology to the ‘right’ image produces a much higher success rate and can extract text with little error.
How we built the vFlat App
We developed a deep learning model that flattens curved book pages, and decided to do it on mobile to provide the best end user experience. Inside the vFlat app, there is a ‘live preview’ feature where users can see a book page flattened in real time (above 20FPS). If users can preview the scanned page as a flattened image in real time just using the app, they can adjust the angle and frame before taking the picture.
To achieve real time inference in a mobile app, we optimized our trained model and leveraged the benefits of hardware acceleration. Our initial thought was to implement the inference module with OpenGL ourselves, so we were preparing to implement the model’s layers with GLSL (OpenGL Shading Language).
Fortunately, we came across TensorFlow Lite’s GPU support and decided to give it a try (at the time of writing, the ‘tensorflow-lite-gpu’ package version was updated to ‘org.tensorflow:tensorflow-lite-gpu:0.0.0-nightly’). We reduced the number of weights and complex operations to come up with a lightweight version of the model, and utilized the TFLite GPU delegate for hardware acceleration.
Deep neural networks are a great fit for GPUs as they have more compute power than CPUs, and excel at processing massive parallel workloads. But employing the mobile GPU is not a trivial effort; this is where the TFLite GPU delegate comes in.
The TFLite GPU delegate optimizes the graph of a neural network for mobile GPUs, and generates and compiles compute shaders which are executed asynchronously. Thanks to the TFLite GPU delegate, we saved months of development time by not having to implement our own hardware-accelerated inference module.
Although we saved time and energy using the TFLite GPU delegate, we ran into issues with converting our own model to a TFLite model and integrating it with the TFLite GPU delegate. The experimental version of the GPU delegate only supported operations that are used mainly in MobileNet and did not support some of the operations in our original model.
To utilize the GPU delegate without having to sacrifice the model’s performance, we had to substitute some of the operations while still keeping the overall network structure the same. We experienced issues during the process of conversion, and since the source code was not yet open at that point, it was hard to pinpoint the cause of the errors we were seeing. (The code for TFLite GPU delegate is now available publicly on GitHub)
For example, since the LeakyReLU operation is not supported by the TFLite GPU delegate, we had to utilize the supported PReLU operation instead in the following way:
> tf.keras.layers.PReLU(alpha_initializer=Constant(0.3), shared_axes=[1, 2], trainable=False)
However, we came across an unexpected behavior when we tried to reduce the number of parameters in PReLU operation to 1 by sharing all axes (shared_axes=[1,2,3]). While this code worked fine in CPU mode, the GPU delegate failed with the error ‘Linear alpha shape does not match the number of input channels’. That’s why we ended up sharing parameters only along axes 1 and 2.
Another problem we ran into was when we tried to normalize input data between -1 and 1 with a Lambda layer within a network.
> tf.keras.layers.Lambda(lambda x : (x / 127.5) — 1.0)
This seems to work with the GPU delegate, but when actually run, it falls back to CPU without warning. Normally when this happens, TFLite warns us with a message such as “Failed to apply delegate. Only the first M ops will run on the GPU, and the remaining N on the CPU”. So be careful when using Lambda layers and always try to measure the actual inference time before proceeding.
Although there were several hurdles along the way, we reduced the inference time of our model by more than half by using the TFLite GPU delegate. We were finally able to provide users with a ‘live preview’ feature where the flattened pages are shown in real time.
We can confidently say that using the TFLite GPU delegate was a great choice, and highly recommend trying it out for those who want to deploy their trained model on a mobile device.
To learn more, and try it yourself, read TensorFlow Lite GPU delegate.