The Markable Update: Multi-Object Detection
Happy Chinese New Year! We reserve these Friday entries for really cool updates and developments we want to share with our readers.
Today’s topic: Using machine learning to perform multi-object detection in photos (and videos).
What is Multi-Object Detection?
Multi-object detection is an important problem within the domain of computer vision which aims to find both the types of objects and their locations in an image.
Machine learning comes in two main flavors: (1) “Shallow learning,” which until recently was the only kind of machine learning and utilizes techniques such as Support Vector Machines (SVMs); and (2) the more recently developed “deep learning.”
Deep learning has come en vogue recently with the development of increasingly high performance computer graphics cards (GPUs) which are capable performing trillions of calculations per second. Deep learning takes advantage of this huge processing capacity by building “deep” Neural Networks that function much like the mesh of neurons that make up our brains. What’s exciting about deep learning is that it allows for much more intricate and complicated models that are both more accurate and faster than their shallow learning cousins.
If you’d like to learn more about machine learning and its subfields, we highly recommend checking out this series of videos from the Numberphile channel on Youtube.
Computer vision applies the processing power of a GPU to “look” at image files and other data about key features in the image to create a model of what those features look like in an image. This model can then be used to recognize features whenever they occur in future images that are shown to the network.
In layman’s terms, these networks are learning to see images and identify objects in them the same way that we humans do. This process has seen an explosion of new uses, from tracking objects in video, to enabling self-driving cars to see pedestrians and road hazards (and actually drive themselves), to recognizing the faces of your friends in photos (and suggesting that you tag them), to even just this week, learning to recognize and accurately differentiate between the normal moles on your skin and those that are cancerous.
Here at Markable, we’re using it to identify and retrieve clothing products in photos and videos.
A Small Test
Take a look at the following photo, and without looking at the one after it, we’d like for you to count and identify all of the clothing and accessories that you see, as quickly as you can.
Got it? Good; now, compare how you did with the following image. The numbers next to the boxes are a score between 0 and 1 of how confident we are that we correctly identified the object in the box.
How’d you do? Did you get them all (most of us missed the glasses). Want to try again? Here.
And the answer:
Believe it or not, other than uploading the image to our servers, not a single human input was involved in identifying the products in the above images. The process was entirely done by a computer vision neural network that our engineers trained last week. Here are some more results from an earlier network (try to identify the products!):
Even if you managed to identify a few products in each of the photos in that video, we have some bad news: Each of those photos took our network less than one second to process and identify.
Oh, did we mention that in addition to identifying the products in an image, we’re also returning the best matches for them from our database?
Look Ma, No Hands!
Our previous model — and most of the ones available from other companies in our field — require, at a minimum, for the user to draw a “bounding box” around the product that they want to have the network search for. To us, that half defeats the purpose. An outfit is not made up of multiple standalone items; they move as one, so we made our search work like that too.
What more, further down the road, we’re going to be doing this for everything. Clothing, furniture, people, toys, bicycles, planes, trains (maybe) and automobiles.
The Future is Now
We’re making our tech available for public testing in about a month and would love to see what you want to do with it (we have applications from fields we hadn’t even considered.) If you’re interested and want to get in on the quickly closing first Beta Group, signup here.
Otherwise, we’d love your feedback and ideas, so feel free to email and tweet us. And check back next Tuesday for a post from our Chief Science Officer, Suren Kumar, for some tips and tricks he’s developed for visualizing error in your own computer vision training.