By Anthony Liang, Ex-Intern at Invisible AI
Anthony recently graduated from the University of Michigan Ann Arbor majoring in Computer Science and minoring in mathematics. He will be continuing as a Robotics Masters student with a focus on Reinforcement Learning. Anthony’s goal is to develop intelligent agents that can learn through interaction with the world.
This summer, I had the opportunity to work alongside a very talented group of individuals at Invisible AI. In this blog post, I will be discussing my project focused on barcode detection using deep learning. This is the first of a series of technical posts scheduled for this month so stay tuned :)
Barcodes are used in various commercial applications including manufacturing, healthcare, and advertising to embed useful information for product identification, expiration date, batch number, and more. The general appearance of a 1D barcode is long black stripes while QR codes are 2D with black rectangles. Given the ubiquity of barcodes in our modern world, several robust and efficient techniques are used to read them, including laser scanners and camera readers.
However, there are a few limitations preventing them from being used for large scale applications. Laser scanners, for example, can only read 1D barcodes. They are also unable to read barcodes from screens since they rely on the reflection of light. These drawbacks motivate the need to explore methods of barcode detection which are more reliable and versatile.
Early barcode detection methods relied primarily on traditional signal processing techniques such as corner detection, gradient methods, morphological operations, and so on. These methods were typically assessed against two standard barcode datasets, namely the Muenster Barcode Dataset (WWU) and the Artelab 1D Medium barcode database. These datasets contain a large collection of annotated 1D and 2D barcodes.
More recently, with the growing success of artificial neural networks and deep learning, researchers are quick to apply such methods to the domain of barcode detection. In , the authors used the YOLO (You Only Look Once) detector first to detect bounding boxes of barcodes and then fed the detections as inputs to another neural network to predict the orientation. After correcting the orientation of the detected barcode, they fed the result into an open-sourced barcode reader to read the contents. Using ANN approaches, researchers were able to achieve state-of-the-art performance and establish new baselines on the WWU and Artelab datasets.
Single Shot Detector
For my research internship, I explored the application of several deep learning methods on open-source datasets and a small custom dataset of Amazon Boxes. I started by implementing the Single Shot Detector (SSD)  model. As the name implies, SSD does not rely on region proposal networks, a technique used in other state-of-the-art methods like Faster RCNN, to propose bounding boxes.
Instead, SSD predicts bounding boxes and classes directly from feature maps in a single pass.
To compensate for the lowered accuracy, SSD adopts strategies like using small convolution filters to predict object classes and predicting the offsets to the predefined bounding boxes. Even without the region proposal network, SSD is still able to run inference in real-time while still achieving satisfactory performance.
We at Invisible AI are developing AI-enabled cameras running for real-time monitoring applications which necessitates the use of efficient algorithms. Real-time object detection is important since we want to deploy these models onto our production-ready cameras.
While implementing SSD, I stumbled upon several common issues such as size mismatch, NaN values, and non-decreasing validation loss. After spending many hours debugging a series of these problems, I successfully trained a working model on the PASCAL VOC 2012 dataset .
The PASCAL VOC dataset is a collection of images from 20 different classes ranging from people to animals to household objects. Getting the SSD model to work on the PASCAL dataset was a good indication that my model was functional. I used the stochastic gradient descent (SGD) optimizer with an initial learning rate of 1e-3, a momentum of 0.9, a weight decay of 5e-4, and a batch size of 64.
Trained on a Tesla P100, my model was able to achieve a 70.3% mAP on the VOC 2012 dataset which is slightly lower than the reported value of 74.3% in the original paper. I believe this can be attributed to several factors like parameter tuning, the number of training epochs, and further tweaks in the original model that were not implemented here.
To evaluate the performance of the model, I computed metrics such as mAP on the testing dataset and also manually inspected the bounding boxes detections. My next step after successfully applying the model to PASCAL VOC was to train the model on the baseline barcode datasets: WWU and Artelab. The datasets contain 595 and 465 images with a resolution of 640 x 480. All images in the Artelab dataset contain exactly one EAN13 barcode, while the WWU may have one or more barcodes in an image.
Since these were relatively small datasets, I applied data augmentation to artificially inject more training data. For a given probability, I transformed the original image through a 45, 90, 180, or 270-degree rotation, random cropping, color distortion, or a combination thereof. The rotation augmentation is important because it enables our trained model to be rotation invariant meaning the model should detect barcode regardless of its orientation. Random cropping helps our model generalize to scenarios where the barcode appears in different parts of the image.
To further increase the number of training samples, I also collected my own data by recording a video of myself handling Amazon boxes. In the video, I tried to collect diverse samples of the barcodes under various lighting conditions, angles, depth, etc. I split the video into frames and cropped them to be the same size as the Artelab dataset.
For the training procedure, I augmented/preprocessed about 3000 images from the baseline datasets and concatenated with 1000 frames from my own dataset for a total of 4000 training samples. I resized the input data to have the correct dimensions to feed into the SSD 512x512 model. I trained the model with a batch size of 16 for 100 epochs using the same parameters for training the PASCAL VOC dataset.
By analyzing the bounding box detections of SSD, I realized that there are some fundamental limitations that make SSD suboptimal for the barcode detection problem. In particular, SSD has difficulties locating multi-angle objects and effectively separating them from the background (see figure). Correctly localizing the bounding box for oriented objects is especially important for barcode detection.
This is because the barcoding decoding step requires the barcode to be properly rotated and separated from the background. Additionally, it is often the case that multiple barcodes are located next to each other as seen in the image above. SSD incorrectly classifies all of them as one barcode and outputs a bounding box where a majority of the pixels are from the background. The physical limitations of this model make it less scalable to real-world applications.
A more robust solution proposed very recently is the DRBox , an extension of the SSD model that accounts for the orientation of objects. Instead of predicting a bounding box (like in SSD) that is parameterized by center + width + height, DRBox predicts a rotated bounding box (RBox) which has an additional angle parameter.
Compared to the standard BBox, RBox is able to localize objects more tightly and distinguish small objects in cluttered environments (see figure). DRBox overcomes several of the limitations of the original SSD model that make it more amenable to the barcode detection task. A promising future direction for this project is to implement and adapt the DRBox model and apply it to the barcode dataset.
: E. Ohbuchi, H. Hanaizumi, and L. A. Hock, “Barcode readers using the camera device in mobile phones”. in International Conference on Cyberworlds, Nov 2004, pp. 260–265
: O. Gallo and R. Manduchi, “Reading 1d barcodes with mobile phones using deformable templates”. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 1834– 1843, 2011
: M. Katona and L. G. Nyul, “Efficient 1d and 2d barcode detection using mathematical morphology”. in ISMM, 2013
: A. Zamberletti, I. Gallo, and S. Albertini, “Robust angle invariant 1d barcode detection”. in 2013 2nd IAPR Asian Conference on Pattern Recognition, Nov 2013, pp. 160–164
: D. Kold Hansen, K. Nasrollahi, C. B. Rasmusen, and T. Moeslund, “Real-time barcode detection and classification using deep learning”. 01 2017, pp. 321–327
: W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed. “SSD: single shot multibox detector”. CoRR, abs/1512.02325, 2015
: M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. “The pascal visual object classes (voc) challenge”. International journal of computer vision, 88(2):303– 338, 2010
: Liu, L., Pan, Z., Lei, B.: “Learning a rotation invariant detector with rotatable bounding box”. arXiv preprint arXiv:1711.09405 (2017)