How I failed to replicate an $86 million project in 1 line of code

When an experiment with existing open source technology just cherry-picks results to make it look good

Ryan Baumann
Sep 2, 2017 · 4 min read

The Medium article “How I replicated an $86 million project in 57 lines of code” has been doing the rounds the last few days, describing how an automated license plate recognition (ALPR) system being developed for the Australian Victoria Police could just use the open-source ALPR system OpenALPR instead. This is basically the article-length version of the ubiquitous outraged “Why does this need $X? I could code that up in a weekend!” comments made on any sufficiently-mundane (or complicated!) tech rollout.

However, since OpenALPR is free* and open-source, we can test just how plausible this claim is.

Ignoring the boring stuff like getting OpenALPR working on your local computer, let’s jump straight to trying to automatically pull license plates out of a dashcam video. For my test video I picked “Drive around Bendigo”, a thrilling 27 minute YouTube video of someone “Driving around Bendigo, Victoria, Australia,” which I felt would be a somewhat representative test, as it’s 1080p car footage that’s quite clear and from around the area where the system will be deployed. After downloading it with youtube-dl I fed it to OpenALPR with time alpr --clock -n 1 'Drive around Bendigo-hrD75ebjCms.mp4' > bendigo.txt and let it churn for…

Hm. Well that’s a problem. Processing my 27 minute video took just over 3.5 hours on my 3.5GHz Core i7. Not exactly real-time. Pencil in a few quid for “optimization” and a few more for “ultra-beefy computer hardware in every patrol vehicle” I guess.

Image for post
Image for post
OpenALPR processing time. Yikes.

Anyway, on to the results! Gotta spend CPU cycles to make…catching thieves easier, as the saying goes. Let’s filter the results down to just the potential license plates with fgrep confidence bendigo.txt. Pipe that in to wc -l and it looks like we’ve got 6,137 potential plates (or 1,653 if we filter those down to unique plate numbers). Not bad! Wait, that seems like a lot. Let’s take a closer look:

fgrep confidence bendigo.txt| cut -d' ' -f 6 | sort -u | shuf | head

Some of these seem…bad. Ok, no big deal, let’s deploy some of the “very straight forward code-first fixes” proposed in the article like adopting “a threshold […] that only accepts a confidence of greater than 90% before going on to validate the registration number.”

Running fgrep 'confidence: 9' bendigo.txt | cut -d' ' -f 6 | sort -u to cut it down to just the 90%+ confidence plate numbers and filter them to only the unique ones, what do we get?

0G700       HERE      M5ER      TUG700    WKX2D2
1IR9IT JG700 R1LV TUG7Q0 XS036
1ZZ735 KEEP SLV522 TZ2735 XSP036
G700 LANE T0G70U VKX212 YLJ64D
GR1L LJ641 T2Z735 WKX212 YLJ64I

OK, so we still have some apparent duplicates and recognition errors, and presumably the registration validation will sort these out. Checking these with the VicRoads site, we wind up with a grand total of seven automatically-recognized “valid” plates for 27 minutes of video.

Image for post
Image for post
One of the handful of plates OpenALPR correctly recognized in the wild. Success!
Image for post
Image for post
Is that TUG700 or TDG700 in the middle? OpenALPR can’t decide. Both are valid plate numbers. “YLJ641” next to it apparently isn’t in the VicRoads database.

I’m not being intentionally disingenuous here: filtering the plate matches from OpenALPR down to just the “good” ones is a tricky problem. I encourage anyone to try, and post their methods & results. But even beyond filtering the data down to good matches, the basic problem is that OpenALPR outright misses a huge number of clearly-legible plates in every video I’ve thrown at it, and takes forever to do so.

Image for post
Image for post
This plate is pretty legible in the full-resolution video. OpenALPR recognizes a plate in this one frame, but incorrectly, as “10DID”.

I love open source! I’d love for there to be a free, open source, robust, fast, and accurate ALPR system! It would be great if this project released whatever they do use as open source! But OpenALPR isn’t there yet, and pretending there’s already an open source solution for every problem when there’s maybe 25% of a solution there instead is never going to improve its reputation for quality.

Could this project be done for less than $86M? Maybe. Could they use OpenALPR as a starting point? Also maybe. Would it actually reduce the cost? Who knows: it’s a complex project with complex requirements.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store