Imagenet and why it’s exciting

Imagine being a librarian and all the books were in the wrong place or badly categorized and you had to categorized them. It’s a tough job.

Image this for images, in order to train a model you had to get an image set and then self label them.

Image this are everyday regular pictures. How would you label them when the categories are large? Also, how can we get enough images to represent each category?

Enter imagenet and wordnet.

First a primer on wordnet.

A big part of wordnet is synset. A grouping of similar words

I believe what wordnet strives to do is form a hierarchical clustering of words.

A porshe is a sports car. A sports car is a type of vehicle. A vehicle is a type of transportation.

So you can transverse down a path but you can also transverse laterally. An suv is a type of sports utility vehicle.

Here is a sample representation in code. Below represent synonyms of a car.

It shows homonyms — words that sounds the same but have different meaning

hypernyms — i think of these as supersets. broad topics/themes that other objects fall under

Code was adapted from

# sample code to get setup
from nltk.corpus import wordnet
car = wordnet.synset('cat.n.01')
print car.lemmas.names()

So then where does image net come into play?

Say wordnet gives us synset. ( a grouping of similar words). Then image net gives us a set of images for each meaningful synset. This is extremely exciting for training a model.

We know have images across 100k categories. This is exciting for object detection.

Here’s an example:

One thing i’ve been learning with deep learning is that your model is really dependent on what you train it with. i.e training your model on MNIST dataset does not mean it can classify street numbers.

This paper describes the methodology

A few questions:

a) where does imagenet source it’s images from?

It sources it from image search queries. Type in a word and gives you many images

b) How did it confirm the labels?

the query term tell us the labels but the internet can be a place where anything gets posted. Using mechanical turk, they validated the labels.

This is amazing because know we can train models to classify images across a broader set of categories.

We could do text to image search and image to image search as well through the combination of image net and word net.

See this paper here