Driver’s license OCR CNH with Python, OpenCV, and Tesseract

Juliano Lazzarotto
stackchain
Published in
6 min readJul 26, 2019

The pain

Have you ever got stuck while trying to access a building that requires you to identify yourself? You have to show your company credentials along with your id. Then wait for checking till you are allowed (or not) to get access. If you haven’t, I envy you. Otherwise, I bet that you don’t like waiting. Therefore I’d like to reduce the hustle of this process. It could also be applied in many other situations, to simplify the KYC process, for example.

The problem

When we talk about reading a paper document, we have to consider so different factors that can affect this process, like different versions of the same document, the preservation state, lightning, rotation, and many more.

The experiment

What we’ll be doing today is taking the Brazilian Drivers License aka CNH, and trying to extract all information on it, including the face photo, turning everything that we were able to “read” into something that we could be using later.

To achieve this, we will be applying some image processing techniques, including color conversions, thresholding, bitwise operations, morphological operations, and contour extraction.

It will be many of these operations, to make it clear what is going on I’ll be adding some screenshots for every step, so don’t worry if you are not familiar with these operations. These screenshots will help you understand how we can chain some image processing techniques to achieve a computer vision solution.

Let’s get started.

For first we will start importing the packages. Some of these packages don’t come with Python. Therefore you will need to install them manually. They are OpenCV, imutils, skimage, numpy, and Tesseract.

Note: I’m using the last versions (by the date of this post) of these packages, if you are afraid that will break your environment, consider using Python virtual environments, and don’t forget to run workon command before installing these packages.

After importing the required packages, we are going to build the argument parser and load the image.

We define some constants; One is the regex that we are using to drop off undesired characters.
We also define an argument parser, add one argument, parse, and storing it as the variable args.

There is only one required command-line argument:
— image: The path to the image file that will be processed.

After that, we load the image and resize it, after that we are ready to apply some computer vision techniques.

The first stage

It is time to clean up the image, removing the useless noise, but at the same time trying to preserve at maximum the text characters and the borders, that we’ll be using to detect our ROI (regions of interest). Nevertheless, the first function is called “cleanImage.”

This function receives two parameters. The first is the image array, and the second one is the stage of cleaning. We will be reusing this function for the other images that we are going to crop later, our ROI.

First, we convert the image to grayscale follow by two morphological operations TopHat and BlackHat, using a kernel of 3x3. The BlackHat will reveal the dark areas with a light background, on the other hand, TopHat will show the light areas with a dark background. Combining both, we can enhance contrast.

Using blur here is not a good idea once it is too noisy near the straight lines, and we are going to use them to find out our ROI. Most of my attempts it end up to connect the components that we want to split.

The result is a binary representation of the image that attempts to keep the lines connected. The most crucial operation here is threshold_local.

The threshold_local is a function from scikit-image that is adaptive, instead of using a magical number it will using a method, in this case, we are passing gaussian, to consider all neighbors of the pixels. You will discover that when the light conditions are non-uniform across the image, working with a single threshold value of T will not perform well.

The second stage

Now that we have a binary representation of the image, we will try to figure out the ROIs and extract them.

So we find the contours. Then we apply some constraint restrictions it must have at least some area, height, and width, otherwise, we’ll drop it.

Here we use cv2.minAreaRect to draw a rectangle around the contours.

The third stage

Now that we have our rectangles, it is time to start selecting which of them has a high probability of being what we want.

However, there are a few issues. We don’t know the orientation of the document nor of the rectangles from the last process.
So what we are doing here is, detecting the top left point. Then we use it as a base to rotate and crop.
Another important thing that is going on here is, we are dropping everything that does not have our expected aspect ratio.

These rectangles are going to be our input to the next stage.

The fourth stage

At this moment, we have ended up with an object that contains all the information that we need to start transforming the image into text.

Due to the weirdness of the Tesseract, we are doing something tricky here. We are trying to read the text from the original image and the gray image, and then we select which one got the best results by the number of characters detected.

We are classifying what type of information the text is, based on the rectangle aspect ratio, and some logic (not bright but for PoC it’s working fine).

The fifth stage

We got our data, so it’s time to add the avatar on our JSON data. To achieve that we are going to use a pre-trained model for face detection.

To select the “right” face, we are using some magic numbers.

After picking the face, we encode it as base64 to be able to embed it into a JSON object.

Then we return our JSON object.

It was able to completely read my driver’s license (CNH) with 100% match. I’ve tried with four more documents with 95% of success and 90% rate of detection, sometimes it misses one or other part. But on average I’m pretty satisfied with this achievement.

You will be able to find this source code on github. Feel free to try it by yourself. Follow a preview below.

--

--