Training a Machine to Draw

A review of methods I’ve used to train a machine to draw. What worked, what didn’t work, and future directions.

Avi Latner

Published in

The Startup

5 min readDec 4, 2020

— modified 12/08/2020 —

Summary

In this set of experiments, I used 4000 flat art illustrations from various artists. Most images contained a person doing something. To train, I used a Pytorch pix-2-pix implementation and a few other libraries.

Training objectives were:

To improve the machine’s ability to color images without color smearing.
To train the machine to add and complete lines. That’s one method to take simple doodles and turn them into more elaborate drawings.
To train the machine to fix proportions of input drawings. For most of us, keeping the proportions right is probably the thing we struggle with the most when we draw. So if we can get a machine to fix proportions it can be a huge step towards machine learning assisted drawing.
Create a model that uses photos as a source for drawings. This could be useful because in photos objects already come in the right proportions.

Coloring was successful and also, to a large extent, completing missing lines was successful. Fixing proportions, however, did not work well. Finally, using pictures as input looks promising but it has to used with another input source.

1. Coloring without smearing

In previous training, I showed the ability to color illustrations. There’s also a dedicated pix-2-pix and cycleGAN photo coloring training (which I did not use), so it is not a big surprise that coloring works. This time, with a much larger data set ‘coloring’ worked very well and usually without smearing and going out of line.

In the set of pictures below, the left-most image ‘real_A’ is what the machine got as input in a validation set. The right-most image ‘real_B’ is what the expected result should be. The middle picture ‘fake_B’ is what the machine created. You can see in pictures 1 to 3 that the machine-generated images closely resemble the expected result.

2. The capacity to supplement missing parts

I produced the input images with ImageMagick's implementation of canny edge detection. In order to teach the machine to supplement missing parts, I created a set of edge drawings with a lower threshold (sigma=2). That way, these input images had some of the outline missing.

In ‘picture 4’ below the person’s whole upper body is missing. The machine knew how to complete the upper body’s shape and even added a hand.

In ‘picture 5’ below the back of the coat, the right sleeve and the shoes are missing. The machine knew how to complete them, even if the connection between the right sleeve and the are is not too convincing.

In ‘picture 6’, again the upper part of the body is missing the machine knew to complete it.

Picture 4 — source image Katerina Limpitsouni

Picture 6 — source image Katerina Limpitsouni

3. Fixing proportions

To teach the machine to fix proportions I created a set of distorted drawings. In picture 7 below, the top-left corner shows a well-proportioned drawing. That top-left illustration is the expected outcome of this training. I created the distorted inputs by compressing or stretching part of an image or all the images. In picture 7 below we see illustrations (going clockwise from top-right) with an upper-body that’s too short, a lower-body that’s too short, and a whole body that’s stretched.

It did not work. Picture 8 shows a sample of what the machine created (middle image) and it's evident that the machine did not pick-up the patterns. If I had used just one type of distortion, for example, vertically stretching, I’m sure the results would have been better. In a previous experiment, I got the machine to change outlines with a more homogeneous data set. But fixing just one type of distortion is not particularly useful.

For all the image translation data-sets that I’ve seen used in articles, the translation object had similar proportions (e.g. horse-to-zebra, apple-to-orange, facades, city-maps). Perhaps image translation algorithms are not suited for this proportion fixing and another approach is needed here.

4. Deriving proportions from photos

Fixing proportion didn’t work. What if, the source for the proportions will be a picture? Posing for the camera is easy for the user. A picture is always in proportions and using OpenPose the pose can be derived.

Pictures 9, 10, and 11 are from the training set and the pose skeleton was derived from an illustration. These pictures show that the machine is quite good at translating a pose into a drawing. It also exposes the limitation of this technique, the pose skeleton does not include information about the art style.

Picture 9 — source image ‘Freepick’ StorySet Amico

Picture 10 — source image John D. Saunders

Picture 11 — source image unknown (scraped from the web)

In this training, I just used 250 paired images and with a larger training set, the machine illustrations will be crisper. Still, using photos in the validation set showed this approach has potential (picture 12).

Picture 12 — photos of me turned into flat illustrations

Another approach?

MixNmachGAN released in 2020 is promising. The pose can be derived from photo’s and OpenPose, the illustration style from a particular artist, the color scheme from a color picker, and the background from a fourth source.

Picture 13: Source Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
UC Davis, published in CVPR 2020

Anecdotes and some other things to improve

The machine can’t make up facial features yet (picture 14) and doesn’t know a fire is red and yellow (picture 15). The fire, for example, can be fixed by adding a few more fire pictures to the training set. Similarly, a training set of drawings with outlined facial features can be used to create machine-made images with faces.