Shippping Label reading from boxes using cv2 and deep learning (part 2)

Akash Thomas
Beautiful ML
Published in
5 min readFeb 21, 2020

part 2 : finding text from the image

In part 1 of this article we saw how to preprocess the image and cut out the sticker from the image in an aligned manner. Now we will try to read the text from the cutout image

This part of the article mainly focuses on finding the text areas and running OCR on them. If you have a better OCR than what I have used you can skip most of these steps.

Note that based on the image you choose the preprocessing steps may even decrease the quality of results in the worst case. But the methods that I have used can be included in your project as you see fit.

The final image that we got in our last post

Since we need to find the text areas inside the image. For that, we are going to use an east text detector model to find the text areas in the image

So similarly what we did for the hed edge detection we will create a blob of the image and pass it through the east text detection model. Explaining the model is beyond the scope of this article. But if you want to know what is happening then you can refer to the original research paper

First, we need to download the frozen model. You can download the model and see a simple explanation about its implementation from this link.

We will implement the below code to resize the image to a suitable size

Now we need to set up the model and pass the resized image through it.

We will set two output layers and pass the blob of the image through the frozen model. If you need a better understanding of this step you can refer to this article

Now we have scores and geometry of the bounding boxes from the image.

We need to decode the output of the east detector to find out the boxes properly. Here we will use a decode function. The logic of decoding is also slightly complicated so we will borrow it from the OpenCV sample. A further explanation about the decode function can be found in this StackOverflow thread

Now we will decode our result using the decode function

If you look into the boxes and confidences you can see that there is a large number of boxes. It is because the model finds more than one box for the same text. We can solve this issue using Non-Maximum Suppression. You can see more on Non-max Suppression here. We are going to use the non-Maximum Suppression provided by OpenCV

In the indices variable, we have the index of all boxes that are valid from the boxes list. Now we convert the data from the boxes into BoundBox objects for further operation.

lines 5–15 : we find the centre, length, breadth and angle for the boxes which we have found in Non-max Suppression

line 22 : we will use the data to create a BoundBox object

line 276: we will scale the boxes to match the size of the original image. Here we will make use of the ratios which we found before resizing the image

line 29: Append the box to a list of all bounding boxes

Bounding boxes found by the east text detector

As you can see in the image we have a lot of bound boxes and to make it meaningful we need to merge those are on the same line and belong to the same sentence

for this we have a function inside the class BoundBox which will do the work for us. We need to set the dx which is the ratio of the distance between to words to the height of a word. We can set it as 1.2

The variable merged_box contains a list of bounding boxes which are merged to nearby words.

Now we can read the text inside the boxes using OCR

For OCR we have different options. If you are ready to pay for the service we can use google vision, Amazon Rekognition or any other good OCR service available. Or we could use pytesseract which is free. In the end, we are going to get the coordinates of a bunch of boxes and words inside them.

In this article, we are going to use pytesseract for OCR

Please refer to this link for pytesseract installation guide

line 5: we will make the box slightly bigger than the text. This will correct small size differences in the boxes and increases OCR accuracy

line 11: we will crop out the part of the image that the bound box covers. crop image function will take an image as input and returns the cropped image as per the dimensions of the bounding box

line 14: we will run pytesseract on the cropped image

Now in the list text_fields, we have all the strings we identified from the image.

Hope you all enjoyed the article. Please leave a comment in case of any doubts or suggestion. Would really appreciate any response from you

--

--

Akash Thomas
Beautiful ML

I am extremely curious about how things work, especially machines