My Introduction to Image Detection

Mark Moon
Pandera Labs
Published in
6 min readDec 27, 2017

Before I dig into the details of this post, a brief video detailing the trials and tribulations of getting three kids ready for school and onto the bus most mornings at my house:

The problem is the bus arrives at wildly different times in the morning. The beginning of the school year is by far worse as bus drivers learn new routes and parents snap first day of school pictures (guilty as charged on the latter). There are however, reasonable explanations for the tardiness: modified schooldays, construction, and weather all play a factor in the bus’s timeliness. Furthermore, my house is one of the last stops so when the bus is slow due to snowy roads the delay can really add up over the duration of the route.

At this point, it’s worth mentioning in a perfect world, my kids would wake from bed with ample time to have breakfast, get dressed, and prepare for the bus all by themselves. Did I mention my kids are eight, six, and six? So yeah, not gonna happen. If any of you have achieved this level of parental bliss, please leave a comment below, in full detail, explaining your process. There may or may not be some financial compensation heading your way.

Back here in the real world, around the time the bus typically arrives, my wife or I end up watching for the bus while the other frantically ushers the kids through the remainder of the morning routine. By the time we’re yelling “BUS!” it’s not uncommon to have one kid missing a shoe, another missing their coat… it’s definitely a hot mess some mornings. It was in this moment of organized chaos when I started thinking there has to be a better way. Wouldn’t it be great if something could watch for the bus while both parents helped with the kids?

With this in mind, I broke the problem down into the following four categories:

  • Image capture
  • Image analysis
  • Send notifications
  • Scheduling

I envisioned an IoT-type device I could stick on a window or affix to the house dutifully watching for the bus. It would send text messages upon detection of the bus, along with a daily affirmation of how great of a job we’re doing as parents.

Image Capture

Video camera, bam! Step one, done! Perhaps an over-simplification but I had a few cameras around the house I could use to grab some sample video with and move onto the fun part, image analysis. It goes without saying, the final product wouldn’t look like this:

But it was good enough to start with.

Image Analysis

Image analysis was uncharted territory for me. Luckily, some cursory Googling yielded a number of cloud solutions providing just what I needed:

Using the following image, I leveraged online demos to determine which provider, if any, could detect the school bus:

AWS Rekognition

Amazon’s Rekognition was first up and I was pretty impressed. The first two labels (Rekognition lingo for what is detected in the image) were Bus and School Bus, with nearly a 98% confidence score nonetheless. Clearly the gauntlet had been dropped.

Google Vision

I had high hopes for Google Vision because, well, it is Google after all. However, when the results were in the list of labels did not include bus. Disappointed, I moved on.

Azure Computer Vision

Azure was next and the result was a tad confusing. The image description contained bus, so I was hopeful but when looking through the tags bus was nowhere to be found. Perhaps the online demo is only returning a subset of tags but as it was the residential tag was the last tag in the response with only 35% confidence. If Azure can detect a bus well enough to include it in the description, why not include it in the tags?

Clarifai

Clarifai was last on my list and offered a slight twist. In addition to image object detection, the service also supports video object detection. Nevertheless, neither service was able to detect a bus within the image/video. I did however, find much humor discovering my neighbors backyards resembled a calamity, I’ll be sure the pass that tidbit along.

We Have a Winner

The results were in: Amazon Rekognition proved best suited for my bus lookout contraption. I will say though, as a whole I am impressed with how much the services are able detect without any manual training. Given how new some of these offerings are, I can only image how they’ll continue to improve.

Sending Notifications

Once the bus is detected, the next logical step was to somehow notify us of its impending arrival. There wasn’t much research necessary here—after all, my phone is full of all sorts of apps designed specifically to pull me away from whatever it is I should be doing on a consistent basis. For my initial needs, I planned to leverage AWS SNS to send me a text message whenever the bus was detected.

Scheduling

The bus doesn’t come every day all day, so I need a way to set schedules for my bus lookout. Icing on the cake would include integration with the school’s calendar. (Unfortunately, I didn’t get to this feature for this post, so stay tuned for part two of this series.)

Putting it all Together

My background is primarily in server-side Java and JVM languages, so for this experiment I decided to create a Spring Boot application written in Kotlin. I’ve been using Kotlin the last few months and find it refreshing and fun to work with, and definitely worth checking out if you haven’t had an opportunity yet. My goal was to put something together quickly but also leave room for swapping out object detection and notification implementations. The result resembled something like this:

VideoInput.kt vs Webcam.kt

Not wanting to wait for those two times a day the bus went by to test the software, I wrote a quick VideoInput implementation capable of grabbing frames from an existing video. The details are pretty straightforward:

  1. Open the video
  2. For every nth frame, call Amazon’s Rekognition asynchronously
  3. If the response contained a ‘Bus’ label, save the image and send a notification

The Webcam implementation is similar to the VideoInput but it grabs frames from the camera in place of a pre-recorded video.

Thankfully, the webcam-capture library made working with the camera super simple. With these implementations in place, running the application yielded the following output:

Console output
Text notifications

It goes without saying the multiple text messages are annoying and should be debounced but this was a prototype and served its purpose well.

Final Thoughts and Next Steps

All told, I was pleasantly surprised with how much I was able to accomplish with very little coding required. The cloud services’ (specifically Amazon’s) ability to detect a bus with no prior training was the catalyst to prototyping a solution in very little time.

As my lookout contraption evolves, I’m eager test out some of the open source computer vision libraries to determine if they can hold their own against the cloud services. This will require upfront training, but if successful will cut down on my Amazon spend significantly over time. I’m also anxious to tinker with other notification channels. The current text approach assumes my phone is on or near me during the morning rush, which isn’t always the case. It would be fun to integrate with a home automation system to blink lights or sound a chime when the bus is detected.

--

--