Managing ‘Face Generation using GANs’ project using MLflow

Chirag Bajaj
Intel Student Ambassadors
5 min readDec 26, 2019

What is MLflow?

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions:

  • Tracking experiments to record and compare parameters and results (MLflow Tracking).
  • Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production (MLflow Projects).
  • Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models).

Installing MLflow

You install MLflow by running:

pip install mlflow

Using the Tracking API

The MLflow Tracking API lets you log metrics and artifacts (files) from your data science code and see a history of your runs.

Wherever you run your program, the tracking API writes data into files into a mlruns directory. You can then run MLflow’s Tracking UI:

mlflow ui

Now we’ll see how to integrate MLflow with our Face Generation project.

What are Generative Adversarial Networks (GANs)?

Generative Adversarial Networks (GANs) are a type of neural network used for unsupervised learning. GANs generally consists of a pair of neural networks (namely Generator and Discriminator) that compete against each other to analyze and generate an output that is indistinguishable from the inputs.

Generative Adversarial Networks Framework

GAN consists of two models:

  • A discriminator D estimates the probability of a given sample coming from the real dataset. It works as a critic and is optimized to tell the fake samples from the real ones.
  • A generator G outputs synthetic samples given a noise variable input z (z brings in potential output diversity). It is trained to capture the real data distribution so that its generative samples can be as real as possible, or in other words, can trick the discriminator to offer a high probability.
Loss Function

Training a GAN

First, the Generator is kept idle and the Discriminator is trained on the batch of Real data as well as Fake data for n epochs in which the network is only forward propagated. This is done to see how well discriminator is able to differentiate between real and fake data.

Next, we train the Generator, given the results from Discriminator, the loss is then back-propagated which in turn helps the Generator generate more real-like data that can easily fool the Discriminator.

All the training related to this project was done on Intel AI Devcloud.

Face Generation

Face generation is the task of generating new faces from an existing dataset of faces.

This task is achieved by using Deep Convolutional Generative Adversarial Networks (DCGAN). DCGANs are basically a Generator-Discriminator pair with deep convolution layers. Both Generator and Discriminator have the same architectures but reflected.

Dataset

celebA dataset

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including

  • 10,177 number of identities,
  • 202,599 number of face images, and
  • 5 landmark locations, 40 binary attributes annotations per image.

Integrating our project with MLflow

MLflow Tracking

The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.

Logging Parameters

MLflow Projects

MLflow Projects are just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. Each project is simply a directory of files, or a Git repository, containing your code. MLflow can run some projects based on a convention for placing files in this directory (for example, a conda.yaml file is treated as a Conda environment), but you can describe your project in more detail by adding a MLproject file, which is a YAML formatted text file.

conda.yaml is used to share your environment with someone else -for example, so they can re-create a test that you have done. To allow them to quickly reproduce your environment, with all of its packages and versions, give them a copy of your conda.yaml file. To create a .yaml file of your environment perform the following steps.

  • Activate the environment to export: conda activate myenv
  • Export your active environment to a new file:
conda env export > environment.yaml

A project file consists of the following properties:

  • Name: It’s the name of the project.
  • Entry Point: Any .py or .sh file can be an entry point. By default, entry points do not have any parameters when an MLproject file is not included. Parameters can be supplied at runtime via the mlflow run command. You can also specify parameters for them, including data types and default values.
MLproject file

Ready to Run!

To run use the following command:
!mlflow run . -e GAN.py -P learning_rate=0.0001 -P epochs=10 -P noise_dim=100 -P num_examples_to_generate=16 -P buffer_size=202599 -P batch_size=256

Code and Repository

Check out some generated images here: https://thispersondoesnotexist.com/

--

--