Shipping Label reading from box using cv2 and deep learning (part 1)
part 1 preprocessing of image
Text recognition from images is a fascinating and very useful area of computer vision and machine learning. Here let's try to read text from stickers and labels on boxes using OpenCV and some very simple machine learning modules
This article consists of 2 parts:
- Preprocessing the image using hed edge detector and some perspective transformations
- Recognizing the text areas in the image using east text detector and reading it using pytesseract
Passing an image directly through the tesseract will give some results. But preprocessing of the image is where the magic happens, and we can improve the results greatly by this step.
First let's identify the sticker in the image. For that, we are going to use the hed (Holistically-Nested Edge Detection) edge detector.
HED is deep learning algorithm that is used to detect edges in the image. If you want to dive deep, refer to the original paper or read can have a quick read here
Even though HED is a machine learning model we don’t need to worry about the implementation. The DNN module in the OpenCV already comes with HED. We just need to configure it and use it.
Before passing the image through hed, lets first remove the unwanted noise in the image and resize it to a smaller size. we will also store the ratio of the original image to restore the bounding boxes later
The fastNlMeansDenoisingColored function removes small noises from the image and makes it more cleaner for us to the edge detection
After the noise removal, we find the hed of the image.
There is a Crop layer that this network uses which is not implemented by default so we need to provide our own implementation of this layer.
After implementing the crop layer we register the layer and find the hed of the noise-reduced image. Insights into this section for the code for hed generation is beyond the scope of this article. If you feel like diving deep feel free to refer to this link.
And don’t forget to set the path to hed prototext and model accordingly
you can download the hed model and configuration from this link
Now we find all the contours in the hed. Since we removed the noises we should get some good results (clear, big sized and small number of contours)
Now we need to find big rectangular contours from all contours we found. So we do an approximation of the contours and check for the biggest one with four edges
Here we need to set the parameter epsilon carefully. You have to tune it to the environment and size of the sticker we are trying to read. You can tweak it a bit and land somewhere easily
Now we need to scale the rectangular contour to match the original size of the image. So we create a new NumPy array and fill it with the scaled values
From this point we are going to working with the rectangular shape we got from the previous step. To make things easier we will use a python package boundbox. We can install from pip
pip install boundbox
We can create a new BoundBox object using the NumPy array rect
The box object contains the coordinates of the box we found from the image
Now we need to cut out only the sticker part from the image. Now we need to cut out the image and do a perspective wrap. Note that this is different from simple cropping. The perspective wrap enables us to correct the alignment of the sticker and text. We can use the method perspective_wrap from the boundBox class directly for this.
We pass the noise-reduced image to the perspective_wrap function of the box object we created earlier. This should give us the image with only sticker
Now we have finished preprocessing the image. Now we can jump into reading the text from our transformed image.
We will see that in the next part of this article
Please leave a comment in case of any doubts or suggestion. Would really appreciate any response from you