How vFlat used the TFLite GPU delegate for real time inference to scan books

Aug 13 · 5 min read

A guest post by Kunwoo Park, Moogung Kim, Eunsung Han

A flattened scan of a curved book page using the vFlat app

Although there are many mobile scanning apps available for download, most focus on digitizing flat documents and struggle when it comes to scanning the curved pages of a book. When scanning a curved surface with text, a person may either do a destructive scan by removing the bindings of the book or publication, or simply accept the input as is and deal with reading the curved images captured by the camera.

That’s why at VoyagerX we developed vFlat, an Android app that uses deep learning to solve this issue. The vFlat app aims to let users easily scan their books without having to worry about curved pages. It also tries to reduce manual input from users by automatically determining the boundaries of a book page.

Left: Curved book page image captured by a normal mobile camera. Right: Scanned version of the same image using vFlat

This comes in handy when a user tries to extract text from a photo of a book page via OCR (Optical Character Recognition). When extracting text from the ‘left’ image above, OCR fails to correctly recognize some words and text lines due to the fact that they are too curved. However, applying this same technology to the ‘right’ image produces a much higher success rate and can extract text with little error.

OCR result for image B inside vFlat app

We developed a deep learning model that flattens curved book pages, and decided to do it on mobile to provide the best end user experience. Inside the vFlat app, there is a ‘live preview’ feature where users can see a book page flattened in real time (above 20FPS). If users can preview the scanned page as a flattened image in real time just using the app, they can adjust the angle and frame before taking the picture.

Live preview’ feature of vFlat

To achieve real time inference in a mobile app, we optimized our trained model and leveraged the benefits of hardware acceleration. Our initial thought was to implement the inference module with OpenGL ourselves, so we were preparing to implement the model’s layers with GLSL (OpenGL Shading Language).

Fortunately, we came across TensorFlow Lite’s GPU support and decided to give it a try (at the time of writing, the ‘tensorflow-lite-gpu’ package version was updated to ‘org.tensorflow:tensorflow-lite-gpu:0.0.0-nightly’). We reduced the number of weights and complex operations to come up with a lightweight version of the model, and utilized the TFLite GPU delegate for hardware acceleration.

Deep neural networks are a great fit for GPUs as they have more compute power than CPUs, and excel at processing massive parallel workloads. But employing the mobile GPU is not a trivial effort; this is where the TFLite GPU delegate comes in.

The TFLite GPU delegate optimizes the graph of a neural network for mobile GPUs, and generates and compiles compute shaders which are executed asynchronously. Thanks to the TFLite GPU delegate, we saved months of development time by not having to implement our own hardware-accelerated inference module.

Although we saved time and energy using the TFLite GPU delegate, we ran into issues with converting our own model to a TFLite model and integrating it with the TFLite GPU delegate. The experimental version of the GPU delegate only supported operations that are used mainly in MobileNet and did not support some of the operations in our original model.

To utilize the GPU delegate without having to sacrifice the model’s performance, we had to substitute some of the operations while still keeping the overall network structure the same. We experienced issues during the process of conversion, and since the source code was not yet open at that point, it was hard to pinpoint the cause of the errors we were seeing. (The code for TFLite GPU delegate is now available publicly on GitHub)

For example, since the LeakyReLU operation is not supported by the TFLite GPU delegate, we had to utilize the supported PReLU operation instead in the following way:

By changing

> tf.keras.layers.LeakyReLU(alpha=0.3)


> tf.keras.layers.PReLU(alpha_initializer=Constant(0.3), shared_axes=[1, 2], trainable=False)

However, we came across an unexpected behavior when we tried to reduce the number of parameters in PReLU operation to 1 by sharing all axes (shared_axes=[1,2,3]). While this code worked fine in CPU mode, the GPU delegate failed with the error ‘Linear alpha shape does not match the number of input channels’. That’s why we ended up sharing parameters only along axes 1 and 2.

Another problem we ran into was when we tried to normalize input data between -1 and 1 with a Lambda layer within a network.

> tf.keras.layers.Lambda(lambda x : (x / 127.5) — 1.0)

Visualization of the Lambda layer converted by TFLite converter

This seems to work with the GPU delegate, but when actually run, it falls back to CPU without warning. Normally when this happens, TFLite warns us with a message such as “Failed to apply delegate. Only the first M ops will run on the GPU, and the remaining N on the CPU”. So be careful when using Lambda layers and always try to measure the actual inference time before proceeding.

Average inference time on GPU compared to baseline CPU inference time on our model across various Android devices

Although there were several hurdles along the way, we reduced the inference time of our model by more than half by using the TFLite GPU delegate. We were finally able to provide users with a ‘live preview’ feature where the flattened pages are shown in real time.

We can confidently say that using the TFLite GPU delegate was a great choice, and highly recommend trying it out for those who want to deploy their trained model on a mobile device.

To learn more, and try it yourself, read TensorFlow Lite GPU delegate.

A bird’s eye view of vFlat


TensorFlow is an end-to-end open source platform for machine learning.


Written by

TensorFlow is a fast, flexible, and scalable open-source machine learning library for research and production.


TensorFlow is an end-to-end open source platform for machine learning.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade