Shipping Label reading from box using cv2 and deep learning (part 1)

Akash Thomas
Beautiful ML
Published in
4 min readFeb 21, 2020

part 1 preprocessing of image

Text recognition from images is a fascinating and very useful area of computer vision and machine learning. Here let's try to read text from stickers and labels on boxes using OpenCV and some very simple machine learning modules

This article consists of 2 parts:

  1. Preprocessing the image using hed edge detector and some perspective transformations
  2. Recognizing the text areas in the image using east text detector and reading it using pytesseract

Passing an image directly through the tesseract will give some results. But preprocessing of the image is where the magic happens, and we can improve the results greatly by this step.

First let's identify the sticker in the image. For that, we are going to use the hed (Holistically-Nested Edge Detection) edge detector.

HED is deep learning algorithm that is used to detect edges in the image. If you want to dive deep, refer to the original paper or read can have a quick read here

Even though HED is a machine learning model we don’t need to worry about the implementation. The DNN module in the OpenCV already comes with HED. We just need to configure it and use it.

Before passing the image through hed, lets first remove the unwanted noise in the image and resize it to a smaller size. we will also store the ratio of the original image to restore the bounding boxes later

The fastNlMeansDenoisingColored function removes small noises from the image and makes it more cleaner for us to the edge detection

1. Original image. 2.Noise reduced image

After the noise removal, we find the hed of the image.

There is a Crop layer that this network uses which is not implemented by default so we need to provide our own implementation of this layer.

After implementing the crop layer we register the layer and find the hed of the noise-reduced image. Insights into this section for the code for hed generation is beyond the scope of this article. If you feel like diving deep feel free to refer to this link.

And don’t forget to set the path to hed prototext and model accordingly

you can download the hed model and configuration from this link

Now we find all the contours in the hed. Since we removed the noises we should get some good results (clear, big sized and small number of contours)

1. Hed of the Image, 2. All contours found from the hed, 3. biggest contour with four sides

Now we need to find big rectangular contours from all contours we found. So we do an approximation of the contours and check for the biggest one with four edges

Here we need to set the parameter epsilon carefully. You have to tune it to the environment and size of the sticker we are trying to read. You can tweak it a bit and land somewhere easily

Now we need to scale the rectangular contour to match the original size of the image. So we create a new NumPy array and fill it with the scaled values

From this point we are going to working with the rectangular shape we got from the previous step. To make things easier we will use a python package boundbox. We can install from pip

pip install boundbox

We can create a new BoundBox object using the NumPy array rect

The box object contains the coordinates of the box we found from the image

Now we need to cut out only the sticker part from the image. Now we need to cut out the image and do a perspective wrap. Note that this is different from simple cropping. The perspective wrap enables us to correct the alignment of the sticker and text. We can use the method perspective_wrap from the boundBox class directly for this.

We pass the noise-reduced image to the perspective_wrap function of the box object we created earlier. This should give us the image with only sticker

1. Cropped image, 2. Perspective wrapped. Note the difference between two images. Cropped image might give us a misaligned image while the wrapped image gives us a somewhat aligned image which makes the text detection and recognition easier

Now we have finished preprocessing the image. Now we can jump into reading the text from our transformed image.

We will see that in the next part of this article

Please leave a comment in case of any doubts or suggestion. Would really appreciate any response from you

--

--

Akash Thomas
Beautiful ML

I am extremely curious about how things work, especially machines