Nvidia LPDNet vs Meta EgoBlur — blurring license plates with deep learning
Collecting data from cameras and sensors is all fun and games until legal and compliance matters enter the game. But that’s why we have superheroes who can protect us by masking or blurring out personal identifiable information (PII) like license plates or people’s faces.
Much like in various other aspects of life, you can build something from scratch or use pre-trained models available for free. Since most of us consider anonymizing datasets as a non-core task, instead of spending days building the next generation of license plate detectors, we use what’s available on the market.
The Use Case
Let’s imagine that we store camera feeds from a fleet of cars to use the video files for offline analysis, training or simulations. Due to privacy reasons, we need to blur any faces and license plates to comply with PII requirements. Looks like an easy CV task: some basic OpenCV code from the internet and we’re done.
Well, not really.
The issue is that external cameras can be dirty (one look at my car will reveal as much), visibility can be low and it could be raining or snowing outside, so our algorithm should be rock-solid in all circumstances.
But what are our options?
Haar Cascades
Perhaps the most trivial solution would be to use Haar Cascades, one of the most well-known solutions for real-time object detection. These algorithms rely on simple features and can detect objects quickly on resource-constrained devices. However, they are less sophisticated for capturing complex patterns in data than deep learning models, and you cannot easily train/retrain them with your own data sets.
Let’s try the default license plate model from OpenCV. While it was trained on Russian license plates, it can also detect ones from other countries:
import cv2
faceCascade = cv2.CascadeClassifier('haarcascade_russian_plate_number.xml')
img = cv2.imread("input.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = faceCascade.detectMultiScale(gray,scaleFactor=1.05,
minNeighbors = 5, minSize=(25,25))
Applying it to my car’s image produces promising output:
Not great, not terrible. It’s okay. It found the plate. But even for a crisp image like this, we got another bounding box. Now, let’s try it out in a more realistic scenario using the left repeater cam’s image:
No luck, it didn’t find the car’s plate.
If the image has some noise or the license plate is small, not frontal, we need a more heavyweight solution.
Deep Learning FTW
Convolutional Neural Networks (CNN) are the standard when it comes to object detection. With the recent advancements in deep learning, the availability of large training datasets and fast GPUs have enabled CNNs to outperform traditional algorithms in accuracy and speed for image recognition tasks. The most widely used models/frameworks for object detection that employ CNNs are:
- YOLO (You Only Look Once): Known for its speed and efficiency in real-time detection.
- SSD (Single Shot Multibox Detector): Balances speed and accuracy by using a single neural network for detection tasks.
- Faster R-CNN: A more accurate model that introduces Region Proposal Networks for generating object proposals.
- RetinaNet: Uses a focal loss function to handle the class imbalance problem, which is helpful for detecting objects in varied scales.
You can find license plate and face detection algorithms with any of these networks. Not surprisingly, the two most common models, LPDNet (YOLOv4) and EgoBlur (Faster R-CNN) are also built on one of these frameworks.
But are they good? And if so, which one is better for our use case?
NVIDIA LPDNet
NVIDIA offers two pre-trained license plate detector (LPD) models based on DetectNet_v2 and YOLOv4-tiny. Both have versions trained on NVIDIA’s US license plate and CCPD datasets. DetectNet_v2 uses a ResNet18 feature extractor with GridBox detection, requiring post-processing like DBSCAN or NMS for final outputs. YOLOv4-tiny employs cspdarknet_tiny as its feature extractor. The US models are trained on over 45,000 images, while the CCPD models use about 172,000 images from a Chinese city’s streets, as detailed in an ECCV 2018 paper. The training focused on minimizing localization and confidence loss, making it an ideal choice for a use case like ours. The LPDNet is generally optimized for fast inference, making it a feasible choice for edge devices (I was able to make it work on my old Jetson Nano).
Since YOLOv4-tiny models are newer and offer higher accuracy, I decided to start with that.
LPDNet is available from the NVIDIA GPU Cloud (NGC) Catalog, as part of the the TAO Toolkit. From the model's webpage, you can directly access the model files in TensorRT (TLT) format, prepared exclusively for NVIDIA GPUs. By definition, the model is free to use in commercial applications, except for a few edge cases (competing with NVIDIA, etc.).
The NVIDIA ecosystem is largely a mess, so you might need to spend some time getting it to work if you’re using a non-standard configuration. Maybe the easiest way to work with the model is through DeepStream SDK, which is a set of plugins for inferencing on top of the gstreamer framework. I used the DeepStream SDK 6.0 container on my Jetson Nano and the 6.3 container on a cloud dGPU instance, and, after installing a few missing packages, things started to work.
The results were great, LPDNet was able to detect LPs from low-res, low-quality feeds. (scroll down to the comparison video).
But what about EgoBlur? Can Meta build something better?
Meta EgoBlur
The EgoBlur model was recently released by Facebook Research, and it promises to do exactly what we need: removing license plates and faces from image and video streams.
The concepts are explained in the EgoBlur: Responsible Innovation in Aria paper, but in short, Egoblur is based on the Faster RCNN model with a ResNext backbone. The two models (Face and LP) are trained using Meta’s publicly available Detectron2 and Detectron2go libraries, are approximately 400 MB and have ~104 million parameters each. It’s licensed under Apache-2.0, so it’s completely free.
The Meta/Aria team aims to achieve performance comparable to (or better than) other publicly available methods for face and license plate detection on cameras, especially on AR and VR devices. According to my tests, the accuracy indeed is quite convincing, slightly better than LPDNet’s.
Installing the model is also a pleasant experience: the GitHub repo comes with a conda environment file, making the project simple to deploy. EgoBlur can run on both GPUs and CPUs, and the GPU is not limited to NVIDIA chips only. However, due to some issues within PyTorch v2.1, I was unable to use it on Apple M2 silicon with metal performance shaders (MPS device).
Okay, so which model is the best?
Side-by-Side Comparison
Seeing is believing, so here’s a sample output from a video recording. I downloaded the video file straight from the car’s camera system, with all the noises and distortions you can expect from an on-board cam.
Both models provide accurate output, but EgoBlur slightly outperforms LPDNet in terms of false positives. Both output videos were recorded without object tracking across frames, and adding that will further improve the quality (license plates rarely disappear from one frame to another). DeepStream comes with an out-of-the-box tracker (gst-nvtracker), while in the case of EgoBlur, this is something we need to add to the pipeline ourselves.
From an accuracy point of view, the winner is EgoBlur according to the jury’s (my) subjective decision.
Inference Performance
Accuracy is great, but the price per minute of anonymization also matters a lot. LPDNet had blazingly fast inference, even on edge devices:
On the other hand, EgoBlur is slow. On their website, the FAQ answers a question on inference speed:
Not super fast. I also ran tests on T4 and V100 dGPUs, as well as on my Jetson Nano, and got similar results:
Here, LPDNet wins by orders of magnitude.
Summary
The verdict is straightforward: if you need real-time or fast inference with acceptable accuracy, and have NVIDIA silicons at hand, then go with LPDNet. You won’t be disappointed. If quality and accuracy are more important than speed, or your startup recently bought a big block of A100s, then EgoNet is a decent option, just don’t expect it to be fast.
REACH OUT TO US HERE TO LEARN MORE:
READ MORE STORIES FROM STARSCHEMA: