This eye does not exist — Generating the dataset from unlabeled image data

Gonçalo Abreu

Published in

The Startup

4 min readMay 27, 2019

https://thiseyedoesnotexist.com/

This eye does not exist

All the images shown in this website are generated by a computer without user input. The machine learning algorithm…

thiseyedoesnotexist.com

First part of this series!

To make thiseyedoesnotexist I had to generate a dataset suitable for generative adversarial training. This meant I had to find a way to sort thousands of unlabeled images using an automatic unsupervised method. I didn’t know that in the beginning, but I discovered in the process. This story is a recipe for that procedure.

I like to use and create unique datasets so I can develop my intuition the fullest.

Since I had zero experience with generative adversarial networks, I thought I should document some problems I had to overcome.

Quoting Wikipedia: “A generative adversarial network (GAN) is a class of machine learning systems. Two neural networks contest with each other in a zero-sum game framework. This technique can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics. It is a form of unsupervised learning.”

I’m not doing any introduction about how a GAN works since there are a lot of materials online with far better insights than the ones I could give. I actually think the original paper by Ian J. Goodfellow et al. is very good at explaining it.

Ian Goodfellow recently appeared on Lex Fridman MIT Artificial Intelligence podcast. In case you want to know more about the history of GANs you should watch it: https://youtu.be/Z6rxFNMGdn0

Initial Plan

This was my first plan:

gather images about makeup related stuff
train a DCGAN on the dataset

This adventure started with the gathering of 200k publicly available images related to makeup. There are multiple methods to do this and I will leave that to your imagination.

Here is a sample from the images gathered,

I happily followed the tutorial on the Pytorch website, regarding the DCGAN implementation and started the training using a 1080ti.

When the training finished I was shocked with how bad the results were. I actually run the thing quite some time trying to understand what went wrong.

E.g.,

The fake images actually got worse with more epochs of training,

The generator and discriminator loss could confirmed this,

At this moment I realized I was lacking some intuition about GANs and their problems so I started reading a lot about them. Eventually I understood that the distribution I was trying to model was too rich and that GANs suffer from something called mode collapse.

Since I had already trained a DCGAN I had the following thought:

What if I use the discriminator as a feature extractor to try to separate the images into classes? The discriminator must have built some sort similarity measure on the last layer. If I use a subset of images, all very similar to each other, I might achieve better results.

I devised a second plan:

Second Plan

Since I already had trained the DCGAN and I wanted to make the distribution easier to learn I made the following plan:

Use the features generated by the last layer of the DCGAN discriminator
Reduce every image feature representation to 50 dimensions using PCA
Make a t-sne of top of that (reduced it further to two dimension)
Use one of the newer GAN architectures such as ProGAN

After converting all the images to their “last layer discriminator representation”, reduced them to 50 components using PCA and applied t-sneto a 2 dimensional representation, the plot I got was the following:

At this point I felt really excited. I quickly used Kmeans to understand what was in each of the clusters. Each of the image grid corresponds to images sampled from the annotated red cluster(s):