Using GAN for fashion

Kirill Bondarenko

Published in

Analytics Vidhya

6 min readJan 8, 2020

How to use GAN to improve fashion industry ?

INTRODUCTION

Hello everyone! If you opened this article, probably, you are interested in such topics like: fashion, machine learning, deep learning, artificial intelligence and customization.

Nowadays each internet or real life user wants to get a product specifically designed for his or her needs and wants. How improve fashion industry and move it to the “full customized” state ? One of the ideas is using GAN to generate clothes by its description.

So, let’s begin!

CONTENT

Introduction
GAN understanding
GAN for fashion
Conclusion

GAN understanding

On the picture above is a structure of such type of GAN as DCGAN or deep convolutional generative adversarial network.

How does it work ?

First of all, we need to define a task. Our task is to generate images by text descriptions. It means we will have some encoded input text and we will expect an image. You may think why don’t we use just transposed convolution or some kind of autoencoder with modified input shape ? Of course we can, there is no limit for imagination. But our task will be much more complex because we will need to compute loss for output image and real image(MSE for example). It’s harder to make than understand GAN principle.

Let’s make it simple. How to understand GAN architecture and training idea ?

On the above image is shown an analogy with real life example. Imagine a situation: there is a criminal who wants to make fake money of really good quality and there is a police officer with an aim to distinguish real and fake money. In our world in 90's it was a kind of popular idea.

So, how do they interact ? I will make this story closer to ML point of view. Let’s write it step by step.

Plan of action for initial interaction:

Counterfeiter have read some material (generator input noise/encoded instruction)
Counterfeiter made a fake bill (generator output)
Police officer received this bill and example of a real one(discriminator 2 inputs)
Police officer decides it’s a fake (discriminator output = 0)

It was just a forward pass. How to make generator and discriminator be cleverer ?

First of all we need to evaluate discriminator (police officer). Officer needs to make two things: know how do the real money look like (real loss) and be proficient in distinguishing fake ones (fake loss). In our case police officer mark (or may be some daily bonus for crime identifying) will be calculated like: (Loss (Real money|Officer decision) + Loss(Fake money|Officer decision)) * 1/2 .

Now some math staff and we will continue our story.

We will use BCE Loss (binary cross entropy) with the general formula on a picture below.

Let’s imagine police officer may tell us a probability of is it a real or fake (0–100% or from 0 to 1). And officer tells us: real money: 0.9 real and 0.1 fake (with ground truth 1 real and 0 fake) and fake money 0.2 real and 0.8 fake (with ground truth 0 real and 1 fake). Now let’s calculate the loss.

Police officer loss = 1/2 * ((-log 0.9 * 1-log 0.1 * 0)+(-log 0.2 * 0— log 0.8 * 1)) = 1/2 * (0.04 + 0.09) = 0.065 .

And it’s kind of a good loss! Great, officer!

But what to do with counterfeiter ? He/she also needs to be good in making fake money (in real life of course not, but we are mathematicians and may make any analogies for better understanding). In our case we even want to make counterfeiter really good in making fake money! Let’s help him/her !

We will tell counterfeiter the result of the expertise of the police officer (loss) and will close the eyes of the officer for some time (discriminator weights freezing).

Now we will learn counterfeiter (generator). We flip the labels and make generator fake data as real one and we will use frozen discriminator to work for generator purpose ! Kind of trick.

So now we train counterfeiter to make better money. For example the result of counterfeiter is on a picture below.

Generator loss = -log 0.1 * 1-log 0.9 * 0 = 1 (it’s a kind huge loss and we tell it to the generator and update it’s weights). And next actions repeat with forward pass as described before.

The plan of actions for training (metaphoric):

Evaluate police officer ability to know real money (real loss with real data)
Evaluate police officer ability to distinguish fake money (fake loss with fake data)
Calculate medium error in evaluation for the officer (general loss as a half sum of real and fake losses)
Improve officer and counterfeiter knowledge with new general evaluation result (back propagation for the whole model)
Close the eyes of the officer (freeze discriminator model)
Make counterfeiter think he/she does real money by using “blind” officer expertise (flip labels and calculate generator loss BCE and update generator)
Open the eyes, officer! Counterfeiter now can do better fake bills! (unfreeze discriminator and make forward pass again)

Returning to the machine learning. We are interested to make generator be awesome, yes ? We don’t need a cool discri minator as our work result (in the most of the cases). We need to make fake data!

Now let’s work with the fashion example.

GAN for fashion

How do the results look like ?

On the images above are shown examples when trained GAN has an input as a text and the result is an image. It is a kind of prototype with 128x128x3 images with low quality.

To make this project real were taken 15 GB of images with corresponding text description for each one (gender, style, color, description etc.)

Images data was trimmed to 128x128x3 (RGB) images and text data was transformed with TFIDF technique to 256 values vectors.

In some cases model guesses the main idea like color or men/women/unisex type. What about some new data (text description) ?

Some kind of a horror movie. We see black and red colors, some kind of human standing (may be man) and it’s all.

GANs are extremely hungry for data and slowly in training. Exactly generator is a slowly learner and to make this model it was spent 72 hours with GeForce GTX 1070 Ti for 300000 epochs and 12000 images with corresponding text captures.

What to do next ? DCGAN has its own limit. There is no better results than with 128x128x3 pictures. Model is needed to be more complex.

In this case is better to use ProGAN or progressive growing generative adversarial network. About it — in the next article.

CONCLUSION

GAN — is a good technology to make neural networks be able to “imagine” like humans can do like somebody hear “blue jeans” in imagination appears some kind of images and it’s not like database search, it’s always a process of creation and fantasying.

But architecture must be really complex to make images in a good resolution. For example in ProGAN the networks is training step by step from images 4x4x3 with no sense to images 1024x1024x3 (RGB) with good quality images and deep sense.

Thank you for reading! I hope you found something interesting in this article and probably improved your knowledge in GANs and find new ideas for developing.

With the best wishes,

Bondarenko K., machine learning engineer.