Colten Jackson
Feb 21, 2018 · 1 min read

Hi, thanks for the article!

From the detectron docs, the models are trained on the COCO dataset, “Common Objects in Context” — it’s 80 categories of objects:

BG, person, bicycle, car, motorcycle, airplane,
bus, train, truck, boat, traffic light,
fire hydrant, stop sign, parking meter, bench, bird,
cat, dog, horse, sheep, cow, elephant, bear,
zebra, giraffe, backpack, umbrella, handbag, tie,
suitcase, frisbee, skis, snowboard, sports ball,
kite, baseball bat, baseball glove, skateboard,
surfboard, tennis racket, bottle, wine glass, cup,
fork, knife, spoon, bowl, banana, apple,
sandwich, orange, broccoli, carrot, hot dog, pizza,
donut, cake, chair, couch, potted plant, bed,
dining table, toilet, tv, laptop, mouse, remote,
keyboard, cell phone, microwave, oven, toaster,
sink, refrigerator, book, clock, vase, scissors,
teddy bear, hair drier, toothbrush

And that’s with 200K labeled images of 1.5 million object instances, so it would be a tall task to come up with your own training set! But I’ve read that once you have a model it’s less work to add new objects, I don’t quite understand how that works. In any case I’ll look forward to your future write ups.

    Colten Jackson

    Written by

    Software Developer in Urbana, IL