How to create AI ready data for object detection

Anthony Chaudhary

Published in

Diffgram

6 min readSep 25, 2018

First we need two kinds of data:

raw data, ie images, pdfs, video
encoded meaning, ie boxes, polygons, tags

Defining good data

Quantity

For prototype systems at least 100 images per label. For production systems at least 500–1000+ images per label.

Variety

Images should have a good variety while still being in the same general scope. For example here’s some good variety for dogs.

The background is different,
the dog is different,
the angle is slightly different.

What’s consistent is:

the dog is a similar distance from the camera
these are all real images. ie not mixing computer generated or artistic impressions.

Avoid

Similar images, for example if all our training data is with dogs playing on green grass, it is unlikely to work as well on pavement or water.
Images of the same subject. For example rotating a camera around the same dog may provide lots of different images, but the data will be much too similar to be useful in a production system.

If the image is too similar it doesn’t count for quantity, ie 30 images of essentially the same thing is equal to 1 image on the above quantity scale.

Aim for as much variety as possible, while keeping relevant to the use

Relevancy

Is my data relevant to my planned use?

For example, if we wanted to create a dog detector to use in a park, training data like the image to the left would be much more relevant, than an image like the right where the dog fills the full frame.

If we instead wanted to create a detector for say social media photos, where it is quite possible for the dog to fill the whole frame, the images to the right may be better.

There’s a technical reason for this. When we train a network it samples from the examples provided. That’s what it learns from. So while we may intuitively know that both of these pictures have a dog, the network’s knowledge is limited to the information provided. So to the network these are very different images!

Balance

There should have a similar number of examples for each class.

For example:

250 dogs, 250 cats
250 Golden Retrievers, 250 Border Collies, 250 poodles

Avoid

10 cats and 250 dogs

Images should also be balanced in terms of perspective for example:

all images, the dog fills a somewhat equal amount of the frame
250 images where the dog fills most of the frame and 250 images where it fills 10% of the frame.

If the dog looks substantial different in the image we must treat it almost as a separate class, resetting the quantity requirement.

A good rule of thumb is:

If you can reasonably see it with the human eye, the algorithm can learn to detect it.

And of course if you can’t easily see it yourself, the algorithm probably will have trouble too!

Multiple networks

One option when the availability of data differs is to create two networks.

Let’s say we are building something to detect both gardening objects, like Trowel, Watering nozzle, Sheers, etc. And if a person is holding the object.

Imagine we photograph people holding each object, one at a time. If there were say 10 garden objects we wanted to detect, we would have a 10:1 ratio of hands to garden objects (Assuming the persons’ other hand was always hidden).

Instead, you could train:

A network to detect hands.
A network to detect garden objects.

Limit scope

If the network is only used during the day time, you don’t need night time data.
Fixed camera placement? Make sure training data comes from that camera’s position.

What happens if I don’t have enough data?

One way to think of it is teaching someone something. If we give no examples it would be hard for them to learn. However the more examples we give them the easier it is.

Getting started with Diffgram

Diffgram empowers you to access and create computer vision intelligences, including the critical step of creating training data.

Finding an existing project

Diffgram has a growing library of existing projects. In this example we are going to find a project related to apples, and add pears to the data.

Clicking on the project brings us to the project homepage

Forking existing data

Clicking fork creates my own copy of the project

A fork is a copy of a project. Forking allows you to freely make changes without effecting the original project. Common uses:

Proposing a change to an existing project
Starting point for your own idea.

Importing new data

Click upload. Then drag and drop your data.

Creating a label

In order to map the meaning onto your data you need to create a label. A label represents the meaning behind what is in the data.

For example we may be used to common labels such as: “cat”, “dog”, “airplane”. However labels can also have more complex meaning, for example even arbitrary concepts like “unfolded laundry” or “folded laundry”

Your first annotation

Now that we have a pear label let’s start drawing! Pears are a good candidate for a “box” .

Here’s some pears!

For more details on getting started see the help center articles here.

Getting team mates to help

It would take quite a while without teammates! Let’s invite some people!

Diffgram tracks versions, so you can see who is doing the most work.

Since John forked this project I can see the original work done by himself, and Anthony’s original version.

This can even be used to resolve conflicts, for example John could submit his pear additions back to Anthony’s project. Diffgram tracks changes over time in a similar fashion to git. It also make easier to create multiple copies of your projects to test different ideas and concepts.