Cloud Vision API — How we became 80% more efficient?

Jitesh Dugar
Drivezy
Published in
3 min readJun 5, 2017

Let me talk about our problem statement first. I work at a technology company where it is really important for us to verify a user’s identity before we offer them our services. Since our inception in July 2015 until last month, we were validating user uploaded identity cards (Driving Licence) manually.

Each manual validation of licence would take roughly 2 mins. Sounds less? But, the time became significant when we grew to about 1.2k daily users. Roughly 300 driving licences are validated by our executives daily.

So, how much time did this take us ?

300 licences per day

2 min per licence validation

Total time spent daily — 600 minutes ~10 hours

Image source- Google

There had to be a better way to make this more efficient, since we are in the 21st century where technology dominates humans. Hire a robot? Or, how about building the intelligence ourselves?

Google Cloud Vision API

Google has this revolutionary product in image recognition field that could help us save a lot of time with licence validation job.

Optical Character Recognition was the most important aspect of the Cloud Vision API that would detect and extract text within an image, with support for a broad range of languages, along with support for automatic language identification.

Loved the Label detection feature that tells what exactly is in the image. Let me show you what made this so fascinating for us -

Cloud Vision API Label Detection Result with an actual licence

The API could successfully recognise that the provided image was a driver’s licence. Cool?

Was this enough to solve our purpose? Not yet!

The API could even read the text such as the licence holder’s name, licence# and DOB. And, this was important information considering our business where licence# is extremely critical.

Post Implementation — Did it solve our efficiency problem?

We’ve hit the road now. The problem was solved to quite an extent. Using the cloud vision API, we could add a first level check on all our user uploaded photos and only filter driver’s licences that were clear enough to read text. The time to validate the licences has reduced to only around 2 hours since majority of the work was handled by the Robot err. The Cloud Vision API ;)

We did manage to eliminate an interesting licence image that one of users had uploaded. See it for yourself :)

Intended mock licence uploaded by a Drivezy user

The Google Cloud Vision API actually has a lot of other powerful features, including analysing emotional facial attributes, text extraction & detection, and detecting any [faces, landmarks, labels, logos, properties] in your images.

Do give it a try and share your experience!

Thank you for lending your ears. If you have enjoyed reading this, do hit the ❤ icon below so it could reach others!

Must thank our coding rockstars — @Charu & @Dheeraj for implementing this for us and Hemant Sah for being a wonderful mentor.

--

--