Mobile Image Blur Detection with Machine Learning

From Python Prototype to JavaScript Web App

--

As Chief Data Officer at snapADDY, I’m responsible for various tasks related to Data Science and Machine Learning. Mostly, this means working on our pipeline for parsing contact information from unstructured text. But from time to time, some interesting side projects come up as well. This is the story of one of them.

Let’s start with a short problem description. Among other things, snapADDY builds products that scan and parse business cards. The first step, of course, is to take a photo of the business card in question. This is usually done out in the field, using our apps on a smartphone or tablet.

However, people sometimes tend to take blurry images (e.g. when they’re in a hurry etc.). This results in our OCR engine not being able to properly recognize the text on the card, which in turn screws up the whole subsequent recognition pipeline. To avoid this problem, we decided to add a blur detection to our apps’ camera view and immediately warn the user if he or she took a blurry photo.

Goal: Provide instant feedback if a user takes a blurry photo.

In this project, we followed a two-step development approach:

  • first, build a prototype in Python (since it has nice libraries for image processing and machine learning readily available)
  • second, port what’s necessary for production to JavaScript (since our apps are built with Ionic)

Image Processing

Step one, before even thinking about algorithms and implementations, was to get our hands on test data. In my experience, this is a best practice that should be observed at any research & development task:

Try to obtain realistic test data as early as possible.

In our case, we collected a test set of 25 business cards and took two photos of each of them: a blurry one and a sharp one. We did this using a DSLR camera and manually setting the focus. Let’s call this test set synthetic data, since it was artificially produced under something like “laboratory conditions.”

Taking test photos of business cards with blurry focus and difficult light settings. Lenovo product placement unintentional.

In addition, one of our customers kindly provided 66 photos of business cards that were taken out in the field.

With the test data available, we started experimenting with a couple of standard algorithms from computer vision. There is a myriad of different algorithms for blur (or edge) detection in the literature; we decided to keep it simple and focus on the well-known Laplace and Sobel filters.

The basic approach is this:

  1. use Laplace (or Sobel) filter to find edges in the input image
  2. compute the variance and the maximum over the pixel values of the filtered image
  3. high variance (and a high maximum) suggest clearly distinguished edges, i.e. a sharp image. Low variance suggests a blurred image

Implementations of both filters are available in Python via the scikit-image package. With image loading, resizing and grayscaling, we arrive at the following short script:

Edge detection via the Sobel filter can be done similarly (by using skimage.filters.sobel instead of laplace). We chose to downscale the image to a fixed resolution for comparability between various images and lower running times in production.

However, a crucial question remains:

What is the right threshold to distinguish sharp from blurry images (based on the computed values)?

This is where machine learning comes into play.

Machine Learning

With the code above, we are able to compute the variance and maximum values based on Laplace and Sobel filter for any given image. Below, you see two photos of a business card from our synthetic data set, together with the computed values.

Two photos of the same business card from our test set, one being sharp and the other blurry.

There is a clear difference between the sharp and the blurred image in all four measure. This is a good sign: it seems like these measures can be used as features for discriminating between the two classes of images (blurry and non-blurry). Let’s check for all images in the synthetic data set:

As you can see in the plots, the features based on the Laplace filter are (much) better in discriminating between the sharp and blurry images in our data set. Based on this observation, we discard the Sobel-based features and continue with the Laplace filter only. Note that all of these observations are based on test data; luckily we obtained this data early on!

Regarding the plot of the Laplace-based features, we observe that our two classes (blurry and non-blurry) are linearly separable. That means that we can find a line in (feature) space such that all data points from one class lie on one side and all points from the other class lie on the other side. In fact, there is not only one such line, but an infinite number of them. This leads to the final question:

Which separating line would be best?

We answer this question using a standard technique from machine learning: by training a support vector machine.

Heavy (support vector) machinery. (Photo by Isis França on Unsplash)

A support vector machine is an algorithm that computes a “best” separating line for us. The line is optimal in the sense that the margin between the two classes along the line is maximal. The plot below shows the optimal line for our data:

SVM trained on our “synthetic” data set. Blue data points correspond to sharp photos, orange points to blurry photos. The decision boundary is optimal in the sense that it realizes a maximum margin between the classes. Note that the features have been scaled in this plot.

Using scikit-learn, it becomes ridiculously easy to train a support vector machine on our data set (and thus calculate the parameters for the line shown above):

The two line at the bottom show how to make predictions with our freshly-trained classifier on sample data. At this point, we have tried our classifier on the 66 photos from our customer and achieved satisfying results: the blurred images were identified correctly.

This could be the end of the story, but since our actual product is built in JavaScript/TypeScript (and not Python), we still need to transfer the prediction function to production use.

Implementation for Production

Currently, the development team at snapADDY builds two (mobile) apps: a classic business card scanner and a tool for capturing contacts/leads at trade fairs. Both apps support scanning business cards, so they should both profit from a camera view with blur detection.

The apps are built using Ionic, a framework for developing cross-platform mobile apps using web technologies (JavaScript/Angular). In fact, using JavaScript wherever possible (frontend, backend, mobile apps) is one of snapADDY’s technical philosophies. Since this very reasonable philosophy is being firmly ignored by the Data Science team, which stubbornly sticks with Python, we now address the issue of transforming research-style Python code into production-level JavaScript code.

Working with canvas. (Photo by Yael Edery on Unsplash)

Since the input for our classifier are features based on Laplace-filtered images, we had to implement a Laplace filter in JavaScript first. This easily possible using HTML5 canvas; I recommend this interesting blog post for details and sample code. Implementing a naive sliding-window approach in this way already yields sufficient performance for our use case. For further speed-ups, one could try using WebGL shaders, but we did not have a reason to do that yet.

This leaves us with the classifier. Since the prediction function of linear SVMs are pretty simple, we did not want to add a full-scale Machine Learning library as a dependency to our production code. Instead, we implemented the prediction function in plain TypeScript:

Note that the code is full of magic numbers that are unique to your application (i.e. the data set you trained with). They can be calculated by running the Python code above. The scaling parameters can be obtained using scikit-learn’s preprocessing.StandardScaler module.

Also, you might have noticed that we have added a third output class: “likely blurred.” This class captures data points that are near the decision boundary, but not quite on the blurry side yet. The definition of “near” is given by the threshold parameter, which was picked manually (instead of being trained in some proper way). However, it works surprisingly well in practice and is a very quick way to further improve the user experience.

Adding some UI elements, this is the end result we arrived at. The classification result is shown as a hint below the picture. Note that we do not show a statement like “image quality is good” for the image on the left even though the prediction was positive: the picture could still be bad in some other way that our classifier does not assess (e.g. exposition, perspective, etc.).

End result: a camera view with integrated blur detection as part of a Web App, implemented in JavaScript/TypeScript.

Conclusions

Summing up the lessons learned in the project, I arrive at the following conclusions:

  • Combining standard textbook methods can yield nice results in practice, and you arrive at easily understandable and tweakable models and code. There is not always need for rocket science.
  • Classic Machine Learning concepts can often easily be brought into production without messing with large libraries/dependencies (and performance issues). Using for example a deep learning approach in the same setting, bringing the classifier into production would have been much harder (although it is sometimes certainly worth it).
  • Think of how to collect (or generate) a sufficient amount of training and test data early on. It is key not only for training models, but also for a proper evaluation of your work!

I hope you enjoyed reading this article and following along with the development of this project. Please do leave comments and share your thoughts with us!

snapADDY is a technology start-up based in Würzburg, Germany, developing software that helps sales and marketing teams to keep their CRM systems clean and up-to-date.

The company offers two main products: snapADDY Grabber (supporting the in-house sales teams in CRM data maintenance), and snapADDY VisitReport (designed to digitize lead capturing in the field and at trade fairs). In addition, there is a scanner app that has been developed for capturing contact data from business cards and provides a direct CRM connection from the app.

The core of all three software products is an AI-powered contact and address recognition system, which is able to recognize and extract contact information from unstructured text in a wide variety of formats.

Want to know more? Visit snapaddy.com
Looking for a job? We’re hiring! Check out our
job listings

--

--

Benedikt Brief
snapADDY Tech Blog — Tales from the Devs

Chief Data Officer at www.snapaddy.com ❧ PhD in Computer Science ❧ enjoys Data Science, Algorithms, UX, Typography, Classic Cars, and Mountaineering