Demystifying Machine Learning with Anime Characters

Published in

henngeblog

8 min readMay 24, 2019

Hi there, I am Iskandar from Cloud Product team in HENNGE!

As a caveat, I am not an expert in machine learning, but this field has piqued my interest in the past few years. The sophisticated Alpha Go and Tesla self-driving car have inspired me to learn a bit more about machine learning. To learn something new, there’s a piece of popular advice where learning things that you love will yield the best result. Since I’m a big fan of Japanese animation, I decided to use anime characters recognition as my learning topic.

If you are interested in my journey of learning this field, I wrote several articles back in the past:

First Experiment to Image Processing with TensorFlow (Jan 2017)
Image Recognition for Anime Characters (Oct 2017)
A Cute Comparison of Computer Vision Services (Jun 2018)

TL;DR of posts above, as my learning platform, I made a simple website to recognize 100 anime characters (MoeFlow):

freedomofkeima/MoeFlow

[WIP] Repository for anime characters recognition website, powered by TensorFlow

github.com

MoeFlow: anime character recognition with transfer learning

As machine learning gets more powerful, it’s not rare to see a situation where the technology has outperformed human in some specific conditions but failed in a simple obvious case. Hopefully, this blog post can provide you some insights on how researchers are trying to create an interpretation of a machine learning model.

The current state of machine learning for anime characters

For some new readers, you might be curious about the capability of machine learning in anime characters related field. I handpicked two of the most interesting ones and showcased them here.

If you haven’t heard of it, there is a popular website make.girls.moe which utilizes Generative Adversarial Network (GAN) to generate facial images of anime characters.

make.girls.moe: generate facial images of anime characters with GAN

Preferred Networks also developed PaintsChainer which utilizes CNN to colorize your sketch semi-automatically.

PaintsChainer: upload a sketch to PaintsChainer for automatic colorization

For full list of current art-of-state, I recommend you to check the curated list here: https://github.com/soruly/awesome-acg#image-processing.

The famous “Husky vs Wolf” problem

For a human with enough knowledge in a specified field, they can explain the difference between husky and wolf in several sentences.

For example, wolves are taller, lankier, and leggier than huskies. Or, wolves ears are low triangles.

This leads to a question: why our machine learning model said this? or that? Human experts can confidently say that the image above is a Husky. However, our classifier thinks differently: the picture above is a wolf. It gives us time to ponder: what should we do with our model or dataset?

MoeFlow + Visualization

Recently, I made a visualization of how close each anime characters are to each other (https://freedomofkeima.github.io/MoeFlow-visualization/). As a result, most of the times, I agree with my model.

At some other cases, I’m starting to wonder why the model thinks these two characters are similar to each other.

The reason I’m doing this is that I know MoeFlow result is far from perfect since it only works around 75% of the time. Therefore, the model must have some priorities that might be not the same as how the human brain works.

Interpretable Models: Linear regression

One of the simplest methods of explaining a machine learning model is applying linear regression to our model. The idea is quite simple, each feature inputs are weighted based on its importance.

Feature effect (https://christophm.github.io/interpretable-ml-book/limo.html)

For example, in a bike rental problem, we want to know which attribute actually affects the number of rented bicycles per specific condition. From the graph above, we know that temperature plays an important role where colder temperature leads to lower number of rented bicycle.

But here is the problem: some of the features are boolean (yes / no), e.g.: holiday, while some of the features are floating number, e.g.: temperature. How do we apply a certain numeric formula for these features? Is it accurate? In addition, the reality is complex and often, one feature actually affects other feature. High windspeed will often lead to colder temperature, or warmer weather in colder season will play differently compared to colder weather in warmer season.

To solve these limitations, some extended workarounds are applied: Generalized Linear Models (GLMs), Generalized Additive Models (GAMs), etc. Since I’m not an expert on those stuff, I will avoid explaining the details here.

Non-linear effects: GLMs, GAMs (https://christophm.github.io/interpretable-ml-book/extend-lm.html)

One of the disadvantages of this method is, it actually makes the model less interpretable. These methods rely on assumptions about the data generating process. While it is useful for a specific task, it’s hard to create a generalization for other problems.

Model-Agnostic methods

In the community, the long-term goal is to build model-agnostic methods: we want to separate the explanations from the machine learning model. To put it simply, we need an interpretation method that can be used with any machine learning model and able to support different feature representation (text, image, etc).

To name a few of them, we have PDP (Partial Dependence Plot), ICE (Individual Conditional Expectation), and so on. The idea is, we try to simulate one or two features and see what will happen to the prediction when we adjust the value of those chosen features.

These methods come with their own disadvantages. Mostly, we are limited to one or two features since visualizing n-dimensional plot is not intuitive. In addition, creating these visualizations is notable slow since we are trying to adjust a lot of possible value combinations to a single datum before feeding it to our model.

LIME

LIME, Local Interpretable Model-agnostic Explanations, is another approach of creating interpretability of a machine learning model. Instead of trying all possible combinations globally, it tries to create a local perturbance and see how it affects the individual prediction.

For image dataset, the idea is that we want to segment an image into “superpixels” and turning these superpixels on / off. “Off” here can be as simple as a user-defined color such as grey.

Taken from LIME paper: https://arxiv.org/pdf/1602.04938.pdf

From the original image above, we can see how LIME thinks Google’s Inception model is behaving. Moving back to our original “Husky vs Wolf” problem, the following image is produced:

In this case, we realized that our model is wrong because of the snow! Instead of learning features from the actual animal, the model thinks that snow is the key differentiator between husky and wolf. If you search it on Google, you can see that 90% of wolf images contain snow while only ~50% of husky images contain snow.

LIME: Explain MoeFlow to me!

Now, it’s showtime!

It’s time to run MoeFlow model with LIME and see what it thinks.

Unsurprisingly, hair is the most important part of an anime character. This actually makes sense since anime character art style totally depends on the artist. One of the most consistent way of knowing an anime character is from the hairstyle & color.

Original LoveLive! characters, taken from https://github.com/inlmouse/LoveLiveFaceRecognition

If you’re up for the challenge, you can try differentiating LoveLive! characters above. For human beings, in case you missed my previous post, we have facial landmarks which differentiate us from each other. Unfortunately, the anime art style is very flexible and these facial landmarks cannot be used as it is.

LIME: Disadvantage

Despite its capability, LIME has its own problems. Since it tries to train based on local surrogate models, kernel width size for “correct” neighborhood matters. In addition, LIME has instability factor, as each iteration might give different results.

LIME is a very promising method but one should use it with great care as the method itself is quite new and unstable.

Conclusion

By knowing how our model behaves (or goes wrong), we can do feature engineering to improve our model performance. For example, in the “Husky vs Wolf” problem, we know that snow plays an important role in breaking our model. Therefore, we can improve our dataset by:

Increase dataset without snow background
In case we still don’t have enough dataset, we can do some hyperparameters improvement (rotate, crop, change the brightness of our original data)
Add supervised features
Or even, try out some other machine learning classifiers

Demystifying machine learning is a very exciting research field. We don’t want to ride a self-driving car without understanding how it works internally. In the future, the learning process of machine learning will not, or require less human intervention. In addition, a machine learning model will come with its own interpretability: they should explain themselves!

For the development tools itself, model-agnostic interpretability tools will get more attention since they require less domain-specific knowledge. Currently, LIME and SHAP are the two most commonly used techniques for model-agnostic explainability.

If you are interested in this related field, I recommend you to read the “Interpretable Machine Learning” book by Christoph Molnar. If you want to get more technical about LIME, check the official tutorial out here.

Thank you for reading and see you next time!

—

freedomofkeima — Iskandar Setiadi

Software Engineer from Indonesia. Backend Dev. w/ Python & Go. AWS DevOps & Automation. Android Dev.