MLearning.ai
Published in

MLearning.ai

Training a custom object detection model in TensorFlow 1.5

I’m sorry TensorFlow 2

We explored the difference between several object detection models in the last few articles, namely: Faster R-CNN, YOLO and SSD. Now that we understand how each of them differ from one another, let’s try to get our hands dirty with some code. Today out of all days, let’s see how we can build a custom object detection model in TensorFlow 1.5. This entire article here is based on the tutorial from Roboflow, but if you want a super quick working version, I’ve linked my project here in GitHub.

This came just right in time as I had an interview the other day on performing an object detection task.

Before we look at the code, let’s identify our task today, specifically my task during my interview: I was required to build a custom object detection pipeline for detecting license plates and extracting the plate numbers to a log file. Of course, with limited time and resources, I was allowed to use any pre-trained model. Here are my choices:

Model selected: MobileNetSSDv2

From my previous article, we saw that if we want to deploy to mobile devices, we could either go with YOLO or SSD since Faster R-CNN is off the table with its slow speed. SSD is chosen here with the idea that it has decent accuracy and that it is able to detect smaller objects better, which is what we intend to do with license plates. We will try using YOLO in the next article and see how it fares.

MobileNetSSDv2 is an object detection model with 267 layers and 15 million parameters. With the trained model, one can download the small-sized model (~63MB) and run inferences on mobile devices.

TensorFlow 1 is selected here because with TF2, there was a problem installing the object detection API from TensorFlow GitHub. I will give TF2 another try again in a future article.

Dataset

The license plate number dataset is a tiny dataset from Roboflow which can be downloaded here. Here’s how tiny it is:

  • 245 training images
  • 70 validation images
  • 35 test images

All images are annotated with two classes:

  • vehicle
  • license_plate

The problem with this dataset, as you can tell, is the small number of images on which the model can train. If you are able to get a bigger dataset, you only have to generate the TFRecords and a label map from your dataset.

With that settled, let’s jump into the code!

Code

The first 9 steps are exactly the same as the Roboflow tutorial, only with some minor modifications at the dataset download stage. Step 11 is where we run inference on videos to detect vehicles and license plates and recognizing the plate numbers.

1. Initializing variables

Here, following Roboflow’s tutorial, we are just initializing a few variables.

Line #2: The GitHub repo link that we will be cloning the project from.Line #6: Number of epochs to train the model for. This number can be increased to a higher number, with the availability of a larger dataset of course, otherwise it will be pointless to train on a small dataset for a long time. Due to the limitations in Google Colab, 30000 seemed reasonable.Line #9: Number of evaluation steps the model is evaluated for.Line#11–27: We are provided 3 model choices, but we are going for MobileNet SSD v2 as stated in Line #30.Line #33–39: After selecting our model, we want to get the pipeline file and get the batch size.

2. TensorFlow version

Make sure to use TensorFlow version 1.x

3. Clone the object detection repo

4. Install the required packages

5. Prepare data

To download our data, Roboflow has a handy API where we can call and download the data to our Colab file system, where TFRecords and a label map file will be generated automatically. Just remember to get your own key for the dataset!

If you are starting a clean project, be sure to do the entire process of checking your image dataset before generating the TFRecords file and label map.

6. Download base model

7. Configure the training pipeline

We are going to set a few things straight for the training pipeline to work. We also have a helper function to retrieve the number of classes from the label map we downloaded earlier.

8. Train the model

Running the model_main.py from our Object Detection API will directly start the training with our custom dataset. At this stage, we are free to do our laundry and get our coffee before we check back in!

Of course, the output of this cell in your Jupyter notebook or Colab will show you the performance metrics of the model. It is not the best model in the world and it definitely needs some fixing work, but we will establish a base model for us to improve on. Here’s what the metrics look after a 30000-epoch training:

Average precision: 0.323Average recall: 0.345Mean average precision (mAP): 0.3227

9. Exporting and downloading the trained inference graph

Once the training is done, we will need to export the graph for use later on in our inference task.

For me, I prefer to have a local copy of the graph and the label map, hence I choose to download them and then upload them back to my Google Drive before pointing to the path later on.

10. Inference on video

Finally, we are ready to run inference on our test video after training the model for a long time.

First, we have to mount our Google Drive so that we can utilize our downloaded model file and label map.

If you have previously trained your model and want to run only inference, remember to select TensorFlow 1.x.

Because we want to extract the text within the detected license plates, we need an Optical Character Recognition (OCR) engine. Let’s install pytesseract!

In order to make it work in Google Colab, make sure to install tesseract-ocr!

Importing all necessary packages again…

Here we run 3 helper functions, two of which were run initially before we start training, but if you are starting the notebook only for inference, be sure to run these cells!

Because we want to store our detected plate numbers in a text file, we initialize an empty text file.

Without further ado, let’s start inferencing!

Here’s what the short clip of the output video looks like:

There you go! In only 10 steps, we manage to perform a custom object detection task, detecting both vehicles and license plates as well as extracting the plate numbers.

To improve upon what we have done here, I have listed a few things we could work on:

Future work

  1. Number of epochs could be increased to increase precision, recall and mAP.
  2. Number of images in the dataset could be increased:
  • by finding bigger and better datasets
  • by performing augmentation (can only augment a limited number of images before the model still overfits)

3. An attempt on using other detection algorithms like EfficientDet could be explored.

I’ve always wanted to write an article like this, making a step-by-step guide on how we can accomplish computer vision tasks like this one. In this article, I’ve shown how we can train a MobileNet SSD v2 model on a custom dataset from Roboflow in order to detect both vehicles and license plates, after which we extract the plate numbers using PyTesseract, using TensorFlow 1.5.

From the previous articles on explaining image processing techniques, image classification and object detection models to performing the task itself, I’m proud of the progress I’ve made here. Looking forward to writing the next article!

Reference

Roboflow Tutorial

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ray

Ray

Embedded Software Engineer and Indie Game Developer