Nvidia LPDNet vs Meta EgoBlur — blurring license plates with deep learning

Tamas Foldi
HCLTech-Starschema Blog
7 min readNov 13, 2023

Collecting data from cameras and sensors is all fun and games until legal and compliance matters enter the game. But that’s why we have superheroes who can protect us by masking or blurring out personal identifiable information (PII) like license plates or people’s faces.

NVIDIA LPDNet and its new rival, Meta’s EgoBlur

Much like in various other aspects of life, you can build something from scratch or use pre-trained models available for free. Since most of us consider anonymizing datasets as a non-core task, instead of spending days building the next generation of license plate detectors, we use what’s available on the market.

The Use Case

Let’s imagine that we store camera feeds from a fleet of cars to use the video files for offline analysis, training or simulations. Due to privacy reasons, we need to blur any faces and license plates to comply with PII requirements. Looks like an easy CV task: some basic OpenCV code from the internet and we’re done.

Well, not really.

The issue is that external cameras can be dirty (one look at my car will reveal as much), visibility can be low and it could be raining or snowing outside, so our algorithm should be rock-solid in all circumstances.

But what are our options?

Haar Cascades

Perhaps the most trivial solution would be to use Haar Cascades, one of the most well-known solutions for real-time object detection. These algorithms rely on simple features and can detect objects quickly on resource-constrained devices. However, they are less sophisticated for capturing complex patterns in data than deep learning models, and you cannot easily train/retrain them with your own data sets.

Let’s try the default license plate model from OpenCV. While it was trained on Russian license plates, it can also detect ones from other countries:

import cv2

faceCascade = cv2.CascadeClassifier('haarcascade_russian_plate_number.xml')

img = cv2.imread("input.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

faces = faceCascade.detectMultiScale(gray,scaleFactor=1.05,
minNeighbors = 5, minSize=(25,25))

Applying it to my car’s image produces promising output:

Well, it found the license plate—two, actually. While Haar Cascades are fast, they often need to be retrained on larger datasets and require manual tuning.

Not great, not terrible. It’s okay. It found the plate. But even for a crisp image like this, we got another bounding box. Now, let’s try it out in a more realistic scenario using the left repeater cam’s image:

While we humans (assuming you’re also human) can find the license plate, the same Haar Cascade model failed this time.

No luck, it didn’t find the car’s plate.

If the image has some noise or the license plate is small, not frontal, we need a more heavyweight solution.

Deep Learning FTW

Convolutional Neural Networks (CNN) are the standard when it comes to object detection. With the recent advancements in deep learning, the availability of large training datasets and fast GPUs have enabled CNNs to outperform traditional algorithms in accuracy and speed for image recognition tasks. The most widely used models/frameworks for object detection that employ CNNs are:

  • YOLO (You Only Look Once): Known for its speed and efficiency in real-time detection.
  • SSD (Single Shot Multibox Detector): Balances speed and accuracy by using a single neural network for detection tasks.
  • Faster R-CNN: A more accurate model that introduces Region Proposal Networks for generating object proposals.
  • RetinaNet: Uses a focal loss function to handle the class imbalance problem, which is helpful for detecting objects in varied scales.

You can find license plate and face detection algorithms with any of these networks. Not surprisingly, the two most common models, LPDNet (YOLOv4) and EgoBlur (Faster R-CNN) are also built on one of these frameworks.

But are they good? And if so, which one is better for our use case?

NVIDIA LPDNet

NVIDIA offers two pre-trained license plate detector (LPD) models based on DetectNet_v2 and YOLOv4-tiny. Both have versions trained on NVIDIA’s US license plate and CCPD datasets. DetectNet_v2 uses a ResNet18 feature extractor with GridBox detection, requiring post-processing like DBSCAN or NMS for final outputs. YOLOv4-tiny employs cspdarknet_tiny as its feature extractor. The US models are trained on over 45,000 images, while the CCPD models use about 172,000 images from a Chinese city’s streets, as detailed in an ECCV 2018 paper. The training focused on minimizing localization and confidence loss, making it an ideal choice for a use case like ours. The LPDNet is generally optimized for fast inference, making it a feasible choice for edge devices (I was able to make it work on my old Jetson Nano).

Since YOLOv4-tiny models are newer and offer higher accuracy, I decided to start with that.

LPDNet is available from the NVIDIA GPU Cloud (NGC) Catalog, as part of the the TAO Toolkit. From the model's webpage, you can directly access the model files in TensorRT (TLT) format, prepared exclusively for NVIDIA GPUs. By definition, the model is free to use in commercial applications, except for a few edge cases (competing with NVIDIA, etc.).

The NVIDIA ecosystem is largely a mess, so you might need to spend some time getting it to work if you’re using a non-standard configuration. Maybe the easiest way to work with the model is through DeepStream SDK, which is a set of plugins for inferencing on top of the gstreamer framework. I used the DeepStream SDK 6.0 container on my Jetson Nano and the 6.3 container on a cloud dGPU instance, and, after installing a few missing packages, things started to work.

Where Haar Cascades failed, LPDNet v2.1 gave us 99% confidence. Overall, the model was accurate.
I ran into false positives occasionally, even with 80%+ confidence. However, the accuracy was still in the expected range.

The results were great, LPDNet was able to detect LPs from low-res, low-quality feeds. (scroll down to the comparison video).

But what about EgoBlur? Can Meta build something better?

Meta EgoBlur

The EgoBlur model was recently released by Facebook Research, and it promises to do exactly what we need: removing license plates and faces from image and video streams.

The concepts are explained in the EgoBlur: Responsible Innovation in Aria paper, but in short, Egoblur is based on the Faster RCNN model with a ResNext backbone. The two models (Face and LP) are trained using Meta’s publicly available Detectron2 and Detectron2go libraries, are approximately 400 MB and have ~104 million parameters each. It’s licensed under Apache-2.0, so it’s completely free.

EgoBlur is part of Project Aria, an AR Glass and SDK from Meta. EgoBlur is supposed to ensure that recorded videos and images from egocentric devices (like glasses) will be free of PII.

The Meta/Aria team aims to achieve performance comparable to (or better than) other publicly available methods for face and license plate detection on cameras, especially on AR and VR devices. According to my tests, the accuracy indeed is quite convincing, slightly better than LPDNet’s.

In addition to license plates, EgoBlur can also blur faces. All plates were identified even with a 90% confidence threshold.

Installing the model is also a pleasant experience: the GitHub repo comes with a conda environment file, making the project simple to deploy. EgoBlur can run on both GPUs and CPUs, and the GPU is not limited to NVIDIA chips only. However, due to some issues within PyTorch v2.1, I was unable to use it on Apple M2 silicon with metal performance shaders (MPS device).

Okay, so which model is the best?

Side-by-Side Comparison

Seeing is believing, so here’s a sample output from a video recording. I downloaded the video file straight from the car’s camera system, with all the noises and distortions you can expect from an on-board cam.

While LPDNet has a few glitches, EgoBlur is close to perfect — on an image-to-image basis, without additional temporal tracking.

Both models provide accurate output, but EgoBlur slightly outperforms LPDNet in terms of false positives. Both output videos were recorded without object tracking across frames, and adding that will further improve the quality (license plates rarely disappear from one frame to another). DeepStream comes with an out-of-the-box tracker (gst-nvtracker), while in the case of EgoBlur, this is something we need to add to the pipeline ourselves.

From an accuracy point of view, the winner is EgoBlur according to the jury’s (my) subjective decision.

Inference Performance

Accuracy is great, but the price per minute of anonymization also matters a lot. LPDNet had blazingly fast inference, even on edge devices:

LPDNet performs more than 1200 FPS on T4 and 40FPS on Nano (image from https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/lpdnet).

On the other hand, EgoBlur is slow. On their website, the FAQ answers a question on inference speed:

EgoBlur claims 2 FPS for full HD image blurring. (Image from https://www.projectaria.com/tools/egoblur/)

Not super fast. I also ran tests on T4 and V100 dGPUs, as well as on my Jetson Nano, and got similar results:

Quick and dirty performance test results on my environments with 1280x720 images.

Here, LPDNet wins by orders of magnitude.

Summary

The verdict is straightforward: if you need real-time or fast inference with acceptable accuracy, and have NVIDIA silicons at hand, then go with LPDNet. You won’t be disappointed. If quality and accuracy are more important than speed, or your startup recently bought a big block of A100s, then EgoNet is a decent option, just don’t expect it to be fast.

--

--

Tamas Foldi
HCLTech-Starschema Blog

Helping enterprises to become more data driven @ HCLTech, co-founder & former CEO @ Starschema