Train license plates detection model using Detectron2

How to train your object detection model using a custom dataset.

Jarosław Gilewski
Jan 22 · 9 min read

Object detection is one of the key aspects of computer vision. There are a lot of pre-trained models able to detect a wide variety of objects. What if you need to detect your custom objects (not present in the pre-trained model)? This story will tell you how to do it using Detectron2 platform.

Table of contents:

  1. Object detection overview

Object detection overview

These days state-of-the-art object detection models are powered by deep learning and involve two tasks:

  • Image classification predicting the type or class of an object in an image
Two dogs with a cat sitting on the floor selected with bounding boxes
Two dogs with a cat sitting on the floor selected with bounding boxes
Source: Stanford University CS231n, Lecture 11: Detection and Segmentation

We can distinguish two main deep learning approaches for object detection:

  • region-based object detectors including R-CNN, Fast R-CNN, Faster R-CNN, R-FCN

Region-based object detectors are two-stage detector where first, we use a Region Proposal Network (RPN) to generate regions of interests and then send the region proposals down the pipeline for object classification and bounding-box regression. They are mostly more accurate but at the expense of computational complexity.

Single-shot object detectors are one-stage detector where we apply our classifier and bounding box regressor over a dense, regularly sampled set of possible object locations. They tend to be significantly faster, simpler, and more intuitive, but may not be as accurate.

The exception is the RetinaNet model which was proposed by Lin et al. in the 2017 paper Focal Loss for Dense Object Detection. They introduce a new loss function called Focal Loss which is reshaped standard cross-entropy loss, solving the problem of foreground-background class imbalance with single-stage detectors.

Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. — arXiv:1708.02002

Object Detection is one of the most valuable computer vision tasks. The use cases are endless, be it object tracking, pedestrian detection, video surveillance, activity recognition, face detection, recognition and identification, self-driving cars, and so on.

Problem definition

Let’s assume we are building Automatic Number/License Plate Recognition (ANPR) system. Generally, the system is used to automatically detect and recognize license plates in images or video stream. Then, it is possible to look up information on the owner of the car and sends him a traffic ticket for exceeding the speed limit.

Another case could be collecting data for our self-driving car and be compliant to the privacy regulations (like EU’s General Data Protection Regulation). They imply that individual (car owner) can demand company gathering data to remove all personal data that they hold about the subject. It is satisfied by anonymizing personal information.

We can observe the anonymization results looking at Google Street View.

In both cases, we need to create a license plate detection model to be used in our image/video processing pipeline. Let’s create one!

Project setup

I have prepared detectron2-licenseplates project with all the necessary code and data to go through this story.

If you know nothing about Detectron2 and how to use it in your computer vision pipeline, look at my previous story:

First clone the project repository:

$ git clone git://
$ git checkout edd03e4b31ec52487a506f2ed711ce9faf0b94f6
$ cd detectron2-licenseplates

The commit edd03e4b31ec52487a506f2ed711ce9faf0b94f6 indicates the source code compatible with the content of this story.

For project environment setup, I’m using Conda which is also included in Anaconda — data science and machine learning platform. If you are curious about the platform and why to use it read: Get your computer ready for machine learning: How, what and why you should use Anaconda, Miniconda and Conda

Let’s create the project environment:

$ conda env create -f environment.yml
$ conda activate detectron2-licenseplates

The created environment includes all the requirements we need to train and test our model on Detectron2 platform.


To train our model, we will use images from MediaLab LPR dataset. This dataset doesn’t contain annotations, but I created them for you in PASCAL VOC format using CVAT tool (there are also other interesting tools for data labelling like labelimg and labelme).

Here is the structure of our license plates dataset:

└── licenseplates
├── annotations
│ ├── 04ow1.xml
│ ├── ...
│ ├── zb35o.xml
│ └── zhr5k.xml
├── images
│ ├── 04ow1.jpg
│ ├── ...
│ ├── zb35o.jpg
│ └── zhr5k.jpg
├── test.txt
└── train.txt

annotations folder contains Pascal VOC annotations XML files, one file per image. It stores metadata about an image like a folder where the image is stored, its filename, size and each bounding box. There is only one class: licenseplate.

Pascal VOC annotation example
Pascal VOC annotation example

Next, we have images folder with the following content:

dataset/licenseplates/images folder content with car images
dataset/licenseplates/images folder content with car images

train.txt and test.txt is our dataset split to train and test the model.

This dataset cannot be used to build a production-ready model. It is too small. After some cleaning, there are 137 images with one license plate in each. But that’s all we need to play around.

Register the Dataset

For Detectron2 to know how to obtain the dataset, we need to register it and optionally, register metadata for your dataset.

The process is well described with details in Detectron2 documentation.

In general, Detectron2 uses its own format for data representation which is similar to COCO’s JSON annotations. It is a matter of implementing a function that returns the items in your custom dataset and register it:

def get_dicts():
return dicts # in the Detectron2 format
from import DatasetCatalog
DatasetCatalog.register("my_dataset", get_dicts)

For dataset which is already in the COCO format, Detectron2 provides the register_coco_instances function which will register load_coco_json for you and add metadata about your dataset.

Metadata is a key-value mapping that provides information about dataset like names of classes, colors of classes, root of files, etc. which are accessible through MetadataCatalog.get(dataset_name).some_metadata.

In our case, we have the dataset in Pascal VOC format and there is no general-purpose loader for that format. Fortunately, Detectron2 has an implementation for registering Pascal VOC datasets (see detectron2/detectron2/data/datasets/ and register_all_pascal_voc function in detectron2/detectron2/data/datasets/ which could be an inspiration for us.

In our project, there isregister_licenseplates_voc function in licenseplates/ file which will load our data and register it together with metadata.

def register_licenseplates_voc(name, dirname, split):
lambda: load_voc_instances(dirname,

To see if it works there is a quick test in if __name__ == ‘__main__’: block of the code to display the image with annotation using our loader and Detectron2 Visualizer.

if __name__ == ‘__main__’: block of the code triggers if it’s run as the main module only so we would be able to import our module later safely.

$ python licenseplates/

will display random 10 images with annotation from the train dataset. You can switch to the test dataset with option --split test.

A car with a license plate bounding box
A car with a license plate bounding box
Image with annotation

We are ready to train our model.

Model training and evaluation

Our approach will be using transfer learning where the weights of existing network architecture are tuned to predict classes that the original network was not trained on.

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. — Andrej Karpathy, Transfer Learning

We can choose the model config and weights from Detectron2 Model Zoo and create DefaultTrainer — a trainer with default training logic which is:

  1. Create model, optimizer, scheduler, dataloader from the given config.

as Detectron2 documentation states.

The Trainer class is as simple as:

class Trainer(DefaultTrainer):
def build_evaluator(cls, cfg, dataset_name):
return VOCDetectionEvaluator(dataset_name)

We could use DefaultTrainer directly but in our case, we want to add some custom detection evaluation. As a metric in measuring the accuracy of the object detector, we use Average Precision (AP, AP50, AP75). The evaluation procedure of the detection task for PASCAL VOC is described here. Be also sure to reference Jonathan Hui’s excellent article.

The training process goes in four steps:

  1. Register the license plates dataset

We already know how to register dataset but let’s focus a little bit on the model configuration which is stored in configs folder:

├── Base-RCNN-FPN.yaml
├── Base-RetinaNet.yaml
├── lp_faster_rcnn_R_50_FPN_3x.yaml
└── lp_retinanet_R_50_FPN_3x.yaml

Detectron2 provides a lot of different models which can be accessed with detectron2.model_zoo package, but we need to modify them for our case (we have only one class to detect) and have version control on the config in our repository.

I included two COCO object detection baselines from Detectron2 Model Zoo:

  • Fast R-CNN — region-based object detector

and adjust it to our needs.

The model config is setup through setup_cfg function from licenseplates/ script.

Let’s train the Fast R-CNN model:

$ python --config-file configs/lp_faster_rcnn_R_50_FPN_3x.yaml

It takes a few minutes to train this toy dataset (300 iterations) on RTX 2080 Ti with the results alike:

[01/22 14:08:38]: eta: 0:00:12  iter: 239  total_loss: 0.139  loss_cls: 0.026  loss_box_reg: 0.115  loss_rpn_cls: 0.000  loss_rpn_loc: 0.004  time: 0.2075  data_time: 0.0048  lr: 0.004795  max_mem: 2357M
[01/22 14:08:42]: eta: 0:00:08 iter: 259 total_loss: 0.128 loss_cls: 0.023 loss_box_reg: 0.097 loss_rpn_cls: 0.000 loss_rpn_loc: 0.003 time: 0.2074 data_time: 0.0046 lr: 0.005195 max_mem: 2357M
[01/22 14:08:46]: eta: 0:00:04 iter: 279 total_loss: 0.125 loss_cls: 0.024 loss_box_reg: 0.096 loss_rpn_cls: 0.000 loss_rpn_loc: 0.003 time: 0.2072 data_time: 0.0045 lr: 0.005594 max_mem: 2357M
[01/22 14:08:51 fvcore.common.checkpoint]: Saving checkpoint to ./output/model_final.pth
[01/22 14:08:54 d2.engine.defaults]: Evaluation results for licenseplates_test in csv format:
[01/22 14:08:54 d2.evaluation.testing]: copypaste: Task: bbox
[01/22 14:08:54 d2.evaluation.testing]: copypaste: AP,AP50,AP75
[01/22 14:08:54 d2.evaluation.testing]: copypaste: 81.8429,100.0000,100.0000
[01/22 14:08:54]: eta: 0:00:00 iter: 299 total_loss: 0.132 loss_cls: 0.025 loss_box_reg: 0.105 loss_rpn_cls: 0.000 loss_rpn_loc: 0.003 time: 0.2083 data_time: 0.0044 lr: 0.005994 max_mem: 2357M
[01/22 14:08:54 d2.engine.hooks]: Overall training speed: 297 iterations in 0:01:02 (0.2090 s / it)
[01/22 14:08:54 d2.engine.hooks]: Total training time: 0:01:05 (0:00:03 on hooks)

To train the RetinaNet model on our dataset you can run the same script with different model configuration (it will overwrite the results from the previously trained model):

$ python --config-file configs/lp_retinanet_R_50_FPN_3x.yaml

You can observe all the metrics on TensorBoard running:

$ tensorboard --logdir output
TensorBoard content
TensorBoard content
Training curves in TensorBoard


The trained model is saved to output/model_final.pth file and we can use it in our prediction on images from the test dataset:

$ python --config-file configs/lp_faster_rcnn_R_50_FPN_3x.yaml MODEL.WEIGHTS output/model_final.pth

The script will randomly display 10 samples (see --samples option) from the test dataset.

A car with a predicted license plate and false positive one
A car with a predicted license plate and false positive one
Prediction results.

Did you spot the false positive? You can get rid of it increasing the confidence threshold with option --confidence-threshold 0.75.


Detectron2 is the object detection and segmentation platform released by Facebook AI Research (FAIR) as an open-source project. Beyond state-of-the-art object detection algorithms includes numerous models like instance segmentation, panoptic segmentation, pose estimation, DensePose, TridentNet. It is easy to reuse them in your research or create your custom model thanks to its modular design.

I hope that this story will help you train your own model. Happy codding!


Deep Learning in Computer Vision

Jarosław Gilewski

Written by

I’m a senior software engineer involved in software development for more than 20 years. Currently, I’m focused on computer vision and deep learning.

Deep Learning in Computer Vision

More From Medium

More from

More from

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade