Crowdsourcing Image Labeling
This post is the continuation of Anime or Cartoon? — Let the AI decide. In it I describe the project I’m collecting the data for.
In Deep Learning we often work with huge datasets and this project is no exception. I collected around 300.000 images from both anime and cartoons. Processing this many images is not an easy task though.
I designed this simple web app to crowdsource the labeling of these images. I was heavily inspired by GalaxyZoo which is made for classifying different kinds of galaxies.
If you just want to try it out, look no further: http://dawars.me/anime
Now let’s talk about the problem. I wanted to go through every image so that I can filter out the problematic ones but it would have been impossible to do so manually. I found 4 main features that I wanted to assign to each image:
- Some images contain text. This is bad, because I especially don’t want the network to associate Japanese with anime and English with cartoon.
- Again, some images contain the logo of the TV channel. This is similar to the previous case, I don’t want specific TV channels to be associated with either category.
- Some images contain one or more Characters. This is good because it is easier to classify based on the style of the characters (the backgrounds can be more similar between the two categories)
- Sometimes the images are completely black or unrecognizable. In this case I don’t want the Neural Network to learn them.
For the “empty” images with minimal detail (only one color) I ended up checking the file size and deleting everything below ~1 kB (JPEG compression). This was a quick and pretty effective method.
I made a tutorial to teach users what to look for in the images.
You can read this part on my blog at http://dawars.me/crowdsourcing-image-labeling/#used-technologies
I’m very proud of myself to finish a project at this scale from backend to frontend all by myself. I was very excited for Instant Android apps which became public just recently. But unfortunately its support is very limited and even restricted to Nexus devices at this time so I gave upon it. It would have given a native feel to mobile users.
The project is on GitHub, if you want to check it out: https://github.com/Dawars/Anime-Image-Labeling
I’m going to release the dataset once the labeling is finished.