From humans to Manga and other digressions
Image translation is a very hot topic. With recent deep learning tools, it’s now possible to transfer features between pairs of images with relatively ease. Let say, convert a horse into a zebra, an orange into an apple, swap genders or even transfer an artistic style — see for instance http://genekogan.com/works/style-transfer/ or https://www.ostagram.ru to experiment.
Generative Adversarial Networks (GANs) were introduced 3 years ago and brought a whole new perspective in image processing. A recent neural network architecture, called cycleGANs, allow to perform very powerful image translation almost for free. CycleGAN work by learning the features of two sets of images and the correspondent mapping between them.
CycleGANs are made of two generators and two discriminators (to translate images A into images B and vice-versa) competing with each other to generate realistic mappings. Two additional components are trained to assess the self-consistency of the predictions. To train the networks we don’t need aligned images, just two sets of images A and B with different explicit (gender, etc) or implicit (zebra strips) attributes. The network will learn how to create the correspondence between them.
The idea of cycleGANs is quite simple: translate a set of images to other set and check for consistency of the translated images space with respect to the original images.

This technique has several advantages, one of them is the capability to generalize with very few examples and discover the hidden generative process without an explicit formulation.
CycleGANs solves three types of problems: A) style transfer — repaint a photo in Van Gogh or Monet style, B) explicit attribute swap— gender, hair colour, etc and C) implicit attribute swap — adding zebra strides to the horses.
I used a manga crawler to extract 30 000 manga characters and the celebrity dataset (200 000 examples of faces) to tested cycleGANs in two types of tasks: 1) manga generation and 2) gender swap.
Manga Generation
To generate manga I used two setups. Experiment A (4000 manga images and 4000 celeb images) trained for 50 epochs and experiment B (10000 manga images of only female characters and 10000 celeb faces from young women). We used experiment B to make the problem easier for the network, as less diversity is expected, and test the accuracy limits
Manga images had to be cropped as they included full body while celeb only has face. All images were resized to 250x250 pixels.




Experiment A:
To train I used the pytorch code and the default network parameters, as explained in https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.
In next image show the algorithm learning to convert an image into manga at epoch 15 and 30. As the training evolves, images will start to get more manga stylized — long hair, big eyes, strong colours, small mouth.

Some strange behaviours often occurs, as the network memorizes manga elements and tries to transfer them to human faces, like the following face in below the left eye of this women:

Initially we may also observe some weird colorization as the networks is learning how to transfer colour to human pics.

Next figure show the all 6 losses as function of the number of epochs. Note that CycleA and CycleB loss are always higher.

Applying the algorithm to wider panoramic pictures may still work but may we miss the details of the faces as colors and details get blurred, as we can see in the next pic.



Experiment B
In this experiments we used a bigger dataset and only consider young, attractive females from the celebrity dataset. This was done to facilitate the training and explore how well the algorithm generalize. We can see in next picture that the algorithm still generalizes quite well, even if it fantasies in generating a women manga character on top of the head.


These networks are not fully deterministic. Sometimes different runs may result in widely different images, as the next case from the same initialization as shown before.

Below are some examples of images generated by experiment B, from the validation set:



Some insights:
1. use color images — black and white do not have enough contrast
2. use photos with a full face (photo id type), other parts will get blurred (example effect size face in respect to quality)
3. balanced light conditions, specially in the eyes, otherwise one eye will disappear or get misaligned.
4. do not use too much make-up
Some images are very robust to the initial photo while others are very sensitive to initialization or small perturbation in the original image. Next sequence, for instance, was generated by just adding a few color marks in the lips.


Some failure cases still occur after 50 epochs — I’m still running the model for another 50 more epochs. For instance, the following pic. I found this unexpected as the face is artificial and very clean. It this an indication that the algorithm don’t expect perfection?


As a final note, although I was not interested, it turns out that reconstructing human faces from manga is much tougher task— as I was expecting:


My take aways from playing with these models:
- Use the same type of data from A and B (same gender, style, region of interest, etc).
- Keep learning rates low
- do not use identity loss
- avoid mode collapse with Wasserstein metric
- Balance data
Gender swap experiment
This proved to be a much harder problem to solve than I expected - the cycleGAN loss decreased at a much lower rate and I had to use 10 000 examples. This was surprising as I was expecting to be an easier than manga — both faces were human.
I tried with 20 000 examples for 10 epochs to check for the relevance of the size of training data, without a noticeable improvement.

This experiment is still running. expect results soon.
Applications
These networks allow a new kind of applications of machine learning as they are not simple classification algorithms but they learn the generative process.
We are exploring the application of this technique in medical: transforming an X-ray image from a specific machine into an image from a different equipement. Or for data augmentation to generate new data and enrich the actual data set — for instance, generate samples with tumour cells based on samples of healthy tissue. The network will learn what should be this tissue if it has some tumour.
Takeaways
- Avoiding mode collapse and memorizing training data by using a large and diverse training set. Make sure images A and B capture same elements. Some examples of mode collapse.
- careful about the data asymmetry: harder from manga to humans
- You need time. This models take several days to train on a Titan-x GPU. So be patient.
I’m creating a webpage where you can “mangify” your photos. Have fun.
