A Riddikulus Dataset

Building a Harry Potter dataset for beginners

Gant Laborde

Published in

Google Developer Experts

9 min readMar 3, 2021

Idea: The user draws an animal & gets sorted into their Hogwarts house.

TL:DR; The dataset is available here, please upvote it. This whole blog post is the story and example site.

The four Hogwarts houses each have an animal associated with them. So when the user draws an animal, the AI becomes a sorting hat.

Lions to Gryffindor
Snakes to Slytherin
Ravens to Ravenclaw
Badgers to Hufflepuff

I don’t need the data to be perfect. Part of the fun of getting sorted in Harry Potter is the occasional surprise. However, it’s boring — if not useless — to deal with complete randomness. If I could get a dataset where drawings were around 80% accurate to each house, this could be a pretty fun dataset for beginners!

To train an AI to classify drawings, you’re going to need some drawings. Sure, I could draw a badger a few hundred times, but that’s no dataset. Fortunately, Google ran an AI experiment on drawing a while ago and had millions of players contribute millions of doodles with 345 categories. While we only want a few categories, we can take a page from MNIST and create a modified subset that fits our needs.

Of course, the drawings from Google didn’t perfectly cover the animals we needed, but that’s OK. If you review the categories available, there are a large set of animals that are CLOSE to the house animals, enough for a poorly drawn comparison to work. I.e., a drawing of a bird is good enough for a Raven, and looks nothing like a badger right?

The dataset has all kinds of things I don’t need, like timing data and drawing order. This is a pretty cool dataset, but I’ll just need the drawings of the animals, thanks.

Choosing Drawings

So the original choices for subcategories were:

Gryffindor: Lion, Tiger
Slytherin: Snake, Snail
Ravenclaw: Bird, Owl, Parrot
Hufflepuff: Mouse, Raccoon, Squirrel

At first glance, Hufflepuff is left in the cold (classic Hufflepuff), because there’s no official “Badger” category, but in my experience, it’s OK for one class to be the odd one out.

However, after some inspection on the Google data, the mouse was a no-go. Far too many drawings were of computer mice.

So the mouse category was removed from Hufflepuff.

Everyone but Ravenclaw is down to two possible classes at about 10,000 drawings per class.

Out of the corner of my eye, I noticed some pretty neat little skulls. So why not throw in Death Eaters!? They can combo with Slytherin, OR just be their own class altogether.

Now we have 10 classes to help us sort drawings into four houses.

Honestly, I can’t identify a few of these drawings and I’m a human!

Another cool thing is that there’s a simplified dataset where the timing info has been stripped out and the images have been simplified using the “Ramer–Douglas–Peucker” algorithm. I now have everything I need!

One question you might have is “Should you keep the classes of the dataset separate in 10 categories, or should you combine houses like snake and snail together?” That’s a good question let’s check the viability of the dataset.

Checking Viability

Hopefully, the classes are significantly different to a degree feasible for training a model to reach 80%.

One quick way to check is to use t-SNE. T-distributed stochastic neighbor embedding (t-SNE), which is a fancy method for evaluating visual data features in an embedded space. Welcome to the world of nonlinear dimensionality reduction, check out this tutorial here. If that last sentence sounded like a gibberish Harry Potter spell, don’t worry, it’s simple. A healthy t-SNE should be nice and separate clusters. Numbers have been purposely kept visually different and should serve as a very clustered t-SNE.

Here’s the t-SNE on images of numbers from 0 to 9:

I highly doubt we’ll get anything like the number clusters, but it’s worth taking a look.

Unfortunately, the data provided by Google is in vector form. This makes it a bit of a headache to load into tensors for evaluation. Fortunately, someone converted all the images to 28x28 rasterizations which make them immediately consumable.

I’m a bit dubious of how usable 28x28 pixels could be for drawings that are not standardized. However, let’s let the t-SNE decide. Our t-SNE for the 28x28:

Firstly, groupings like Skull, are off by themselves and they look fantastic. Also, Parrot and Owl are near each other, but that’s not a problem because they are for the same house.

The grouping that seems to be a problem is the cluster at the top. Squirrel and Snake are nearly on top of one another. Also, there seem to be quite a few other classes between Snake and Snail.

It seems my low hopes of achieving at least 80% accuracy might be quite lofty. I don’t mind a few surprises, but drawing a snake and getting tossed in Hufflepuff is a bit concerning.

Do we give up?

“It was important, Dumbledore said, to fight, and fight again, and keep fighting, for only then could evil be kept at bay. “ — Harry Potter and the Half-Blood Prince

Training on 28x28

So how well does the 28x28 data do? I engineered a small model to see what kind of performance I could expect. To my surprise, I actually got 81%!!!!

The model was quite simple and was only 400KB in TensorFlow.js. It consisted of three convolution and pooling layers followed by one hidden layer of 128 nodes. The JavaScript code was short and sweet.

Did we just land a perfect dataset!? Should we just take this and go?

Well… maybe 80% wasn’t asking enough. Maybe we should up our goal? Maybe we can get 90% with some improvement? Are we being greedy?

Improving the Dataset — Ideas

One item that bothered me was that I only had a 28x28 image from the vector drawings. The classic 28x28 for classification works great for numbers, but it strikes me as minuscule for drawings. I’d like to see if this problem persists with larger images. This means I’ll have to convert vectors to rasterized images myself.

Secondly, some drawings were… how do I put this politely? … A bit outside of talent and reason.

This means we can do some cleaning of the data and hopefully get a decent dataset that could be trained.

Attempt 1: Larger Rasterizations of Vectors

I’d like to rasterize 256x256 pixel versions of the images. Even if this isn’t the final size used for training models, it’s a larger size that users could shrink as needed. Secondly, the cleaned set of vectors has been normalized to fit in 256x256, so there’s no resizing step required.

I rendered the data to SVGs with a command called ndjsonTosvg and then converted those resulting SVGs to JPGs with a command-line tool called svgexport. The entire process took 20ish hours running on a 2019 MacBook Pro (no M1 chip).

Once the larger 256x256 images were made, I immediately hit a few barriers of running out of RAM. I can get around this with an ImageDataGenerator. However, this significantly cuts down on the utility of the dataset for beginners who might be trying to process the dataset with a simpler setup and lower RAM.

Regardless, I worked around the RAM limitation to prove my concept by downsizing the 256x256 to 128x128 images that barely fit in my available RAM.

The downsize caused some serious deterioration on the lines, but it wasn’t something I had to worry about because the images were still distinguishable.

Time to train the same model on 128x128 versions of the images! Let’s see some success! I trained the same model with the same number of epochs, and then…

NOOOOOO! The accuracy didn’t budge. If anything, it went down. So what gives? Apparently, I was able to OVERFIT THE HELL OUT OF THE DATA! Look at that 99.84% acc on the training data!

So I littered the model with dropouts and normalizations to try to fix the overfitting. It did NOT work. So I guess the size of the images wasn’t the problem.

If you think you can do a better job, you can grab the 256x256 images here.

Attempt 2: Removing Poor Images

One of the things that bothered me in the t-SNE was that it thought birds looked like snakes. After review, there was an obvious culprit. The classic sunset bird!

This might actually be where the problems come from. I’ve found all kinds of improper data.

After a single pass at cleaning up the data, I saw some significant separation.

SO how did cleaning result in accuracy?

I got accuracy up to 90% on validation data! I think that makes it a perfect dataset for beginners and fun! You can use this dataset to train, teach, and share.

Hopefully this will be a great tool for people learning basics of image recognition! Hey, if it can identify my terrible drawings, I know it can handle yours:

And don’t forget to draw a skull 😜. I know you probably have questions, or you’re interested in digging further, so here are the links for you!

Share the website:

Sort your terrible drawings into their Hogwarts House with AI

TensorFlow.js allows AI to see your drawings and then sort them into a Harry Potter Hogwarts house. Enjoy and learn…

aisortinghat.com

Get the data (please upvote if you have an account)

Riddikulus

A creative 28x28 dataset for beginners who like Harry Potter

www.kaggle.com

Learn from me how to build websites that use TensorFlow.js:

Interested in learning how to bring AI and Machine Learning to web with TensorFlow.js?

JavaScript lets you create front-end websites that can leverage the power of AI directly on the browser. Learn from scratch with this book.

Reserve your copy on Amazon

Lastly,

I got to tell a variation of this story at TensorFlow Everywhere North America. Check out the video here:

Kudos to the folks who made this possible

Thanks to the Google team for open-sourcing the QuickDraw data, and for featuring me in TF Everywhere.

Thanks to Harry Potter fans for being everywhere and being fun.

Special thanks to Zaid Alyafeai who’s been a constant inspiration and educational resource.

Kudos to Ali Shakiba and Stephen Thompson who created conversion libraries that were essential to the vector to JPG conversions.

And as always, thanks to the amazing people at Infinite Red who empower everyone to be creative and open. Best consulting company eva!

Gant Laborde is a co-owner and Chief Innovation Officer at Infinite Red, published author, adjunct professor, worldwide public speaker, and mad scientist in training. Clap/follow/tweet or visit him at a conference.