Edge to Face With Pix2pix Gans

Nitisarath
6 min readJun 21, 2022

--

Introduction

Have you ever wanted to Draw realistic Faces? No? well if you ever did you can now do it with edge to face a Model that takes Drawn edges as input and outputs A realistic looking faces Using pix2pix Gans.

Motivation

Ill be Honest this wasn’t the project I intentionally planned on making but Im still glad i made this project changing drawing of edges to realistic Faces is really cool in my opinion i learnt a lot from it and had fun doing so.

Dataset and preparing the data

I needed dataset that clearly show face these are the ones I used were

Kaggle Human faces dataset

Human Faces Dataset from kaggle

CelebA

CelebA dataset
CelebA Dataset Sample

Since they both have high quality images of faces

Then I needed to crop the faces so that the background did not interfere with it to do that we need to firstly detect the face Then we cropped the face since we dont want background interfering and apply Canny Edge detection to it like so

I’m able to achieve this using opencv and haarcascade

I used about 9000 images for both training and validating the model and i used this snippet of code to process the data

Model

for the model i used Pix2Pix GAN architecture but firstly a little introduction to Gans or generative adversarial network

A generative adversarial network is a machine learning frameworks designed by Ian Goodfellow in June 2014

2 models a generator and a discriminator compete against each other

Generator👨‍🎨: generate fake artwork

Discriminator👮: Identify which art work is generated and which isn’t

Im trying to explain it as simply as i can😅 hopefully you’ve gained a basic understanding of it.

GAN architecture

Pix2Pix GAN architecture🏗️

is a Conditional Gans that takes one picture as input and one as label

pix2pix architecture

Pix2pix will then try to convert the input to the label or as closely to it as it can

So basically the We put the Edge Input through the generator(u-net) and it outputs a Generated image of a human face Then we Pair the Generated image to the edge input and put it along side the real face that is also paired with the edge input into the discriminator so it can guess which of the pairs is real and which is fake we will then put the discriminator prediction into binary cross entropy loss function so it can calculate the loss

Gans binary cross entropy loss

then we will put the loss to both the discriminator and generator to change their weights so they can generate / classify better

The generator also has another loss function called L1loss it looks at how different the generated image is to the real one and calculate the loss accordingly

L1 loss function

U-Net ||The Generator 👨‍🎨

Ok lets talk more about the generator

U-net a convolutional neural network usually used for image segmentation it takes in a high resolution image and outputs a segmentation of it.And originally it was used for medical stuff.

So the u-net consisted out of 2 parts

Contracting path

It consists of the repeated application of two convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a max pooling operation with stride for downsampling. At each downsampling step we double the number of feature channels

Expansive path

an upsampling of the feature map followed by a convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes

How the weights is updated(generator)

image from https://www.tensorflow.org/tutorials/generative/pix2pix

Patch Gans||Discriminator👮

is a type of discriminator for generative adversarial networks which only penalizes structure at the scale of local image patches. The PatchGAN discriminator tries to classify if each N×N patch in an image is real or fake

The output of the network is a single feature map of real/fake predictions that can be averaged to give a single score. A patch size of 70×70 was found to be effective across a range of image-to-image translation tasks.

How the weights is updated(Discriminator)

image from https://www.tensorflow.org/tutorials/generative/pix2pix

Results

After training With 9000 pictures here and about 90+ epochs( idk i kinda lost count) here are the results

(i would put more results latr)

Deployment

So i made a streamlit app but it turns out my weights was to big (600 mb) to upload so after trying for a day i finally gaveup (ill prob try to do it after midterms)

this was the streamlit app

So i put it in google colab instead so people can try it out

Code and Deployment

Google Colab <<<click (deployment temporary)

Github Repo << click (its kinda messy)

special thanks

Big thanks my mentors P Ping and P Tj for helping me with my project and the Teaching assistants in AIBuilders for teaching and helping me make this project possible and special thanks to every one in AIBuilders to help making such a wonderful community to learn on.

special thanks to p charin for giving us a wonderful community to make friends and learn on I’ve learned so much in these 10 weeks and had fun doing so and for that i could not thank you enough.

and special thanks to the sponsors of Aibuilders

a program for kids who want to build good AI

References

pix2pix Explainer Video:https://www.youtube.com/watch?v=UcHe0xiuvpg&t=459s

U-net :https://paperswithcode.com/method/u-net

patch gans: https://paperswithcode.com/method/patchgan#:~:text=PatchGAN%20is%20a%20type%20of,image%20is%20real%20or%20fake.

--

--