Guide for License Plate recognition in 2022

Anton Maltsev
9 min readMay 18, 2022

--

One of the first tasks of Computer Vision was license plate recognition. But LPR systems 25 years ago and today are fundamentally different.
In this article, I will describe and summarize what this area looks like today. Where did it come from and where to look to make a good recognition system.

Image by the Author

A little about the history of plate recognition and my experience in it:

  • I totally missed the dark ages when number plate recognition was without any learning parts like Haar or HOG. I used such methods only in other tasks.
  • For the first time, I and my friend created a license plate recognition system in 2012. There were no neural networks (Haar to detect than correlation). Such approaches began to appear in 2008–2010 and were relevant until 2014–2015.
  • In 2014–2015, we created a system for recognizing numbers on trains with Haar-like features for detection + neural network for character recognition. These were the years of transition from old algorithms to new ones.
  • In 2016, when neural networks were already everywhere — we created a recognition system based on neural networks. And use it for car plates and container numbers. Looking at it now, it seems to me that it was a very simple and primitive system. We used Caffe!
  • From 2016–2019 many solutions start to migrate to the Edge. Jetsons, Intel-based, RPi, Rockchipe, e.t.c.
  • In 2020–2022, approaches have changed. Solutions have become more product-oriented. Predictability and stability have become more important. Not only the final accuracy and inference device is important now. Today we need to know how the system will adapt and improve with new data.
Image by the Author

If you want to create your own number plate recognition system, please take into account the specifics of the legislation of the country in which you work. It is very important not only to take a responsible approach to legislation but also to take into account its spirit. Neither you nor anyone else wants to be tracked.

ML Architecture

Let’s start with “what networks are generally used in the license plate recognition problem”. Doesn’t matter what plate we speak about — license plates for cars, numbers for trains, codes for containers. Let’s list all the tasks that may be useful for recognition:

  • Vehicle area search
  • Number area search
  • Plate orientation recognition
  • Character Position Estimation
  • Text recognition (3–4 algorithm options)
  • Number parameters recognition: quality, country, overlap, etc.

Of course, not all the tasks are required for the working system.
Is it possible to solve all the problems with one network? Yes.
But...

Let me show you an example. Let’s train one network, which gives us the entire possible set of parameters at once (like RetinaFace). The position of the cars, belonging numbers, text.
And suddenly you need to add a new country/new number format. You have to retrain the entire network. How to make sure that the data is more or less weighted during training? What if we have 100 countries? And how quick such a neural network will be?
Train such a network is pretty hard. So. Usually, pipelines look like this:

Image by the Author. With Author car. And with the stray cat.

But. Each system should be designed based on tasks. Here is an example of tasks:

  • License plate number recognition on the phone. People use this for filling out fine protocols, checking cars in the parking, etc.
  • Recognition at the entrances to parking/shopping centers. Recognition from vehicles, etc.
  • Recognition on the road
  • Recognition as API, work with any frames

Let me show you the algorithms that are needed for each of the tasks.

License plate number recognition on the phone

Image by the Author

For this application, it is usually not necessary to detect cars. The license plates are large. But the orientation can be a problem. Numbers can be rotated 30–40 degrees (pitch roll yaw). The algorithm usually looks like this:

  • Number area detection
  • Angles recognition
  • Text recognizing

Midsize recognition

Image by the Author

Logic is like phones. Cars are usually close to the cameras. But there can be several cars in the frame, sometimes you need to know the position of the car (entrances). As a result, sometimes it is necessary to additionally detect the area of the car:

  • Car area detection
  • Number area detection
  • Angles recognition
  • Text recognizing

Recognition on the road

Image by the Author

In speed control systems, it is usually necessary to know the maximum parameters. But you need to remember:

  1. The installation place is known. It is not necessary to have a model trained for 100 types of numbers.
  2. Processing speed is very important. Sometimes one computing unit can serve a number of cameras at once.

As a result, the entire model is should be optimized for speed. First, cars are searched at a low resolution, then the license plate area at a slightly higher resolution, and then the license plate is directly processed. So:

  • Vehicle area search
  • Number area search
  • Plate orientation recognition
  • Text recognition
  • Number parameters recognition: quality, country, overlap, etc.

API recognition

Such systems, examples of which are 1, 2. Require good quality for any type of input. This leads to several conclusions

  1. Usually, you should use the maximum number of models
  2. Models must be powerful
  3. Sometimes separate models are used to estimate data domains (frame size, estimate country, etc.)

Algorithms

Having dealt with the architecture of the problem, let’s talk in more detail about the algorithms

Detection and rotation

In 2022 it is difficult to say something fundamentally new about the detector. Especially in such a simple task as “number detection”. This problem was solved quite well even by Haar cascades.
But, it seems to me, that it is necessary to pay attention to several things:

The task of detecting license plates and the task of detecting a car is easier to solve on small resolution. This will improve performance.

The complexity of the detector, of course, should be determined by the conditions of use. To work on the API, we took fairly powerful detectors. But for the remaining applications, the MobileNet-SSD was often enough for us. At the moment, of course, yolov5s is pretty good among the fastest detectors.

It is possible to use detectors with additional anchors for rotation. For example, this. But this is not very good, since the number can be tilted in a different plane. An extension of this approach is to add 4 corner points to the detector. Similar to RetinaFace:

Image by the Author

We prefer detector + turn after. Sometimes we combine rotation with symbol detection. This makes it possible to better control the networks and the result at different stages of training. And when a new, partially labeled dataset appears, do not retrain everything.

Text recognition

For text recognition, there are several different approaches and their combinations. We have used all of them. Let’s look at them in more detail.

Approach 1. Feature map extraction with backbone, slicing as a temporal component, linking the temporal sequence to the target phrase. 7 years ago with CTC-loss. Then LSTMs and similar encoders began to be used. Now the most common is the transformers approach. Most of these methods you can find in MMOCR or PaddlePaddle OCR. That is the most classic approach.

Image by the Author

Minuses:

  1. The solution starts statistically copying the dataset. If you train on texts with the “AB*****” format, but in practice, you meet texts with “BC*****”, then it will not work well.
  2. Problems with two levels and one-and-a-half levels texts. For them, it is necessary to detect strings separately.
  3. LSMT and Transformers are not very good for an edge.

Approach 2. The second class of methods works well for complex numbers (2–3 lines). These are End-to-end methods for recognition. The neural net predicts the position of the next symbol and the value of the current one.
Cons — you need a large dataset for training. For some of the methods, it is necessary to mark the spatial position of the letters. Some of these methods are iterative and pretty long to compute. Not all of these methods are good for an Edge. Here you can find working code, and here are a few articles — 1, 2.

Image by the Author

Approach 3. Recognition through Attention. We used it often 3–4 years ago, but now we have almost given up. We classify the presence/absence of characters in the text as the sum of Attention. Looks like this:

Image by the Author

I have not seen similar articles. The quality is slightly worse than other approaches, but:

  1. No need to mark the position of the characters, it is automatically obtained during training
  2. There is almost no explicit statistical link to the sequence
  3. Transfers well to Edge

Approach 4. Explicit detection/segmentation of all characters. We simply mark the position of all symbols and train the segmentation/detection network.
It takes a long time to prepare a dataset for this method. But:

  1. Works very well with an incomplete task (when new types of numbers are constantly added).
  2. The best method for an Edge for portability
  3. Very good for debugging and visualizing problems
Image by the Author

As I said, we used all of the approaches. For large and finished tasks, the first and second approaches are best. For a developing product, the fourth approach. For small datasets - the third approach.

But to choose a method, you need to look at both the platform and the work criteria. And on the quality of network training on your dataset.

Stop the theory, show me the way

Easy ways to start with plate recognition. Of course, you need to start with ready-made examples. Here are a few:

  1. Nvidia Tao. If you are living in California, work with Jetson and Deep Stream — you are lucky. You can just use the example. It’s very simple and classic: car detection — plate detection — text recognition (CTC-way). You can even retrain it. But, in my opinion, the quality is far from perfection. And Tao is not the best framework for this.
  2. OpenVino. If you are from China — you are lucky. Intel will help you. But they don’t provide a training script for this example. The pipeline is the same: car detection — plate detection — text recognition. And for text recognition, they even don’t use any LSTM-like NN.
  3. OpenSource. There are a lot of samples on GitHub (1,2,3,…). But they are pretty bad IMHO: no support, usually only China’s plates, small datasets. The only exclusion is this. Not ideal, but they have a lot of CIS and EU numbers.
  4. Take car detection from yolov5 or another pretrained net that you like. Use MMOCR|PaddlePaddle OCR| EasyOCR to recognize all the text. Use templates to choose what’s are the numbers. This approach is easy to do, but the quality is not excellent.

It seems to me that the article has already turned out to be quite long. Would you like to tell me more about this topic? I can tell you about speed recognition. Or how to use plate recognition on some specific platforms. How to port the models. Problems with dataset assembly or dataset markup. Features of installing cameras for different tasks (trains, containers, cars) to maximize the quality of the final algorithm.

Don’t forget to subscribe! My Youtube channel. My LinkedIn or my telegram blog. And if you have any questions you can write me — zlodeibaal@gmail.com .

And, if you are interested — we provide Computer Vision consulting https://cvml.rembrain.ai/

--

--