Understanding a deep learning based anime auto-color project — Style2Paints (Part 1)

Published in

AI2 Labs

6 min readMar 20, 2020

Style2Paints is an all-star project, it has over 10k stars on Github, and according to this post, it has been ranked the 4th among the most popular machine learning projects on Github. The picture above is one example shown in the repository. Just with a few color hints, the AI model can convert the line sketch on the left to this fabulous color image on the right. I can’t help but think that I can be an artist too.

If you have not tried the tool, I recommend you give it a try and behold the power of it. The official Style2Paints repository is here, and good news, the author has compiled an exe version for easy usage (how cool is that!).

But curiously, it is not easy to find relevant information about this project. The official repository provides inference code but not training code, and the code is not very well documented, which renders this project a flavor of mystery.

This post attempts to give a better understanding of this project based on the paper released. A disclaimer, Style2Paints has been continuously evolving and it has reached the v4.5 version, but the code and paper released are only for the v3 version, so to be more specific, this post is to understand the Style2Paints v3.

Without further ado, let’s dive in!

Two-Stage Sketch Colorization

The author of Style2Paints has published the paper Two-stage Sketch Colorization. As revealed by the paper, this is supervised training, which means that the training dataset consists of line sketch and its corresponding color image. If you have experience in AI model training, you know that training usually requires thousands of data points, so how can we find those thousands of line sketch and color image pairs?

The Data

The answer is we can’t find them, and instead, we produce them.

For human beings, it is easier to imagine the line sketch from a color image than the other way round. So intuitively it should be an easier task for the AI model too. The author of style2paints think the same, and he has kindly provided us with his tool to generate a line sketch from a color image. The tool is SketchKeras. There is no paper for it, so I do not really know how it works, but anyway it works well, and our focus here is the auto-color process.

The database used is an open-source anime database called Danbooru2019, for those who are not so familiar with the anime world, please take note that there is an SFW download option for this dataset.

As the title of the paper suggests, the sketch goes through two stages to become colorized, where the first stage does a rough draft and the second stage does the delicate refinement.

The first stage, the draft stage

The task of the first stage is to take a sketch and the color hints given by the user, concatenated along the channel dimension, fed into the generator model to get a crude but colorized image.

During training, there is no user providing us with the color hints for our thousands of images, so we have to generate them. The author does this by randomly sampling the color from the color image.

Once the input is fed into a generator model and the color image is generated, it is compared with our target color image to get an L1 loss, and both the target color image and generated image goes through the discriminator to get the GAN loss.

The generator model used here is of U-Net structure, and the training structure is the same as Pix2Pix. If you would like to know more, you can check the information of GAN here, and information about U-Net and Pix2Pix here.

Colorize anime sketch is not an easy task. Unlike monochrome photo which has texture and gradient served as the hints for colorization, anime sketches mostly consist of just black sketch lines on a white empty canvas, and thus provides much less information for colorization. According to the paper, the result of the first stage looks like the draft produced by the artist during quick digital painting — just splashing color onto the canvas without caring about the details. And there is a need for refinement. That’s why there comes the second stage.

The second stage, the refinement stage

The task of the second stage is to take a sketch and the user’s color hints as inputs, and refine the draft generated from the first stage to a good-looking image.

Here is where the interesting part comes in. If we really use the output of stage1 as the input for stage2 to train our model, we won’t get much improvement. Why? Because it will be equivalent to jointly training two models together. They will behave like a single model then, and defeats the purpose of two separate stages.

To avoid this, the author proposed to generate these draft images by pasting patches, do random transform and add random color spray to the target color image.

When I first read the paper, I did not think this idea of draft generation will work. Based on my experience, the output from stage1 is likely to have different defects from the defects generated intentionally. In another word, the distribution of stage1 outputs may be quite different from the distribution of the generated drafts. It will cause the model to fail miserably at testing time.

However, when I try their pre-trained model, it turns out that most of the sketches actually can produce rather good results. I have to admit that this draft generation is very successful.

The training flow is similar to stage 1, but instead of using only the concatenated sketch and color hints, there is another input — draft coming into the model. The generator is a modified version of U-Net. I will give the details of the modification in the next post.

Conclusion

I am rather inspired by the author’s approach. The author actually converts a research problem into an engineering problem. There is nothing fanciful about the model structure, but with this two-stage approach and the data processing at the draft generation part, it can achieve such good results that even the state-of-the-art algorithms cannot accomplish. Again it tells us how data (and data processing) is more important than algorithms.

My initial plan is to continue probing into the code and give an explanation for the model structure the author used. But I realize that there is too much content to cover in one post. Thus I decided to move these into the next post, for those who are keen on the technical details.

See you next time!