[Week 2 — How We Collected and Labeled Our Data]

Mümin Can Yılmaz
bbm406f16
Published in
2 min readDec 16, 2016

We are going to build a system which can tell what ingredients a pizza has. Because there is not much work like this done before, it is a little bit difficult to find a dataset which has pizza photos with labels. So we decided to collect our own dataset.

There are some “recipe” websites where we can easily collect pizza photos and ingredients. There is also “pizza-chain” websites which also has pizza photos with ingredients. Unfortunately, these websites are pretty limited, so we need to find random pizza photos and then manually label them.

Luckily for us, Instagram has lots of pizza photos which can be easily found with #pizza hashtag. There is also accounts of pizza-chains with lots of pizza photos, and even accounts solely posts pizza photos! It is also very easy to parse and download photos from Instagram, so we managed to collect at least 20.000 pizza photos just from Instagram. But how do we label them?

Inspecting all photos one by one and writing ingredients in a text file, without omitting file_name is very time consuming. So we decided to build a simple web application, which fastens our labeling process. This is what it looks like:

It can be used for all types of photo tagging, because the “tag system” is not hard coded. Users can enter the tags (labels) they are going to use, and start tagging photos!

Ofcourse there are a lot of non-pizza photos in Instagram #pizza tag, so we need to delete them. Our system also have a little button for that, which is more convenient than right-click & delete operation.

An example of useless pizza photo

Also, there is some photos which has a lot of “noise”. These photos contains a pizza, but there are also lots of other objects we don’t want. So to get rid of them, we even added a cropping feature to our tool.

This tool is very primitive and also written in just one day, so I really ashamed to put it into my github account. But if you are interested to use it, you can send a message!

--

--