Crowdsourcing Image Labeling

Web App for labeling images

Dávid Komorowicz

This post is the continuation of Anime or Cartoon? — Let the AI decide. In it I describe the project I’m collecting the data for.

In Deep Learning we often work with huge datasets and this project is no exception. I collected around 300.000 images from both anime and cartoons. Processing this many images is not an easy task though.


I designed this simple web app to crowdsource the labeling of these images. I was heavily inspired by GalaxyZoo which is made for classifying different kinds of galaxies.

If you just want to try it out, look no further:

Labeling an image in the app

The Problem

Now let’s talk about the problem. I wanted to go through every image so that I can filter out the problematic ones but it would have been impossible to do so manually. I found 4 main features that I wanted to assign to each image:

  • Some images contain text. This is bad, because I especially don’t want the network to associate Japanese with anime and English with cartoon.
  • Again, some images contain the logo of the TV channel. This is similar to the previous case, I don’t want specific TV channels to be associated with either category.
  • Some images contain one or more Characters. This is good because it is easier to classify based on the style of the characters (the backgrounds can be more similar between the two categories)
  • Sometimes the images are completely black or unrecognizable. In this case I don’t want the Neural Network to learn them.

For the “empty” images with minimal detail (only one color) I ended up checking the file size and deleting everything below ~1 kB (JPEG compression). This was a quick and pretty effective method.

I made a tutorial to teach users what to look for in the images.

Character tutorial

Used Technologies

You can read this part on my blog at

Final words

I’m very proud of myself to finish a project at this scale from backend to frontend all by myself. I was very excited for Instant Android apps which became public just recently. But unfortunately its support is very limited and even restricted to Nexus devices at this time so I gave upon it. It would have given a native feel to mobile users.

The project is on GitHub, if you want to check it out:

I’m going to release the dataset once the labeling is finished.

If you want to get notified for new posts, follow me here on Medium or on Twitter.

Dávid Komorowicz

Written by

Machine Learning Engineer interested in Deep Learning and Virtual Reality.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade