Image Recognition on the Road

When Google announced that they open-sourced their internal object detection system, the Tensorflow Object Detection API, I knew had to try it. Detecting objects in images is a long-standing challenge that has seen huge advances in recent years, thanks to a flurry of research in deep learning. It’s the sort of task that lends itself well to deep learning: easy to describe and intuitive to grasp, but hard to solve formally.

Google’s API offers five state-of-the-art Tensorflow models that identify and localize multiple objects in an image. The models run the gamut from lightweight, meant to work in realtime on cellphones, to heavy duty, which are more computationally expensive but more accurate. They also pre-trained each model on the COCO dataset so that they detect 90 common object categories right off the shelf. A Jupyter notebook is included to demonstrate how to use their pre-trained models.

I managed to get some images of a traffic intersection in South Bay. There are several promising applications for detecting the motor vehicles, cyclists, and pedestrians flowing through an intersection: smarter traffic light signal timing based on the observed count of cars, accident prevention by alerting drivers to oncoming vehicles, and identifying areas where traffic jams commonly happen.

Since the objects to be detected in an intersection fell into the common categories of COCO, I was able to use a pre-trained model. This meant I didn’t have to endure the tedium of collecting a labeled dataset and spinning GPU clock cycles to train. Below is the (rather grainy) original image.

And here is the output after running the most accurate model, faster_rcnn_inception_resnet_v2_atrous_coco.

Box colors and their classifications: {Light green: car, Purple: truck, Neon green: person, Pink: vase, White: motorcycle, Darker Green: traffic light}

I was very impressed with the results! It overcame several obstacles: low resolution, ghosting, occlusion, and lack of color, yet still managed to identify every car, a motorcycle, and the person riding it. It also found a pedestrian on the sidewalk. It did make the mistake of classifying a tree trunk as a vase (pink box).

All in all, the Tensorflow Object Detection API is a neat system for object detection and localization. It’s robust to poor image quality and simple to use.