Managing ‘Face Generation using GANs’ project using MLflow
What is MLflow?
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions:
- Tracking experiments to record and compare parameters and results (MLflow Tracking).
- Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production (MLflow Projects).
- Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models).
Installing MLflow
You install MLflow by running:
pip install mlflow
Using the Tracking API
The MLflow Tracking API lets you log metrics and artifacts (files) from your data science code and see a history of your runs.
Wherever you run your program, the tracking API writes data into files into a mlruns
directory. You can then run MLflow’s Tracking UI:
mlflow ui
Now we’ll see how to integrate MLflow with our Face Generation project.
What are Generative Adversarial Networks (GANs)?
Generative Adversarial Networks (GANs) are a type of neural network used for unsupervised learning. GANs generally consists of a pair of neural networks (namely Generator and Discriminator) that compete against each other to analyze and generate an output that is indistinguishable from the inputs.
GAN consists of two models:
- A discriminator D estimates the probability of a given sample coming from the real dataset. It works as a critic and is optimized to tell the fake samples from the real ones.
- A generator G outputs synthetic samples given a noise variable input z (z brings in potential output diversity). It is trained to capture the real data distribution so that its generative samples can be as real as possible, or in other words, can trick the discriminator to offer a high probability.
Training a GAN
First, the Generator is kept idle and the Discriminator is trained on the batch of Real data as well as Fake data for n epochs in which the network is only forward propagated. This is done to see how well discriminator is able to differentiate between real and fake data.
Next, we train the Generator, given the results from Discriminator, the loss is then back-propagated which in turn helps the Generator generate more real-like data that can easily fool the Discriminator.
All the training related to this project was done on Intel AI Devcloud.
Face Generation
Face generation is the task of generating new faces from an existing dataset of faces.
This task is achieved by using Deep Convolutional Generative Adversarial Networks (DCGAN). DCGANs are basically a Generator-Discriminator pair with deep convolution layers. Both Generator and Discriminator have the same architectures but reflected.
Dataset
CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including
- 10,177 number of identities,
- 202,599 number of face images, and
- 5 landmark locations, 40 binary attributes annotations per image.
Integrating our project with MLflow
MLflow Tracking
The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.
MLflow Projects
MLflow Projects are just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. Each project is simply a directory of files, or a Git repository, containing your code. MLflow can run some projects based on a convention for placing files in this directory (for example, a conda.yaml
file is treated as a Conda environment), but you can describe your project in more detail by adding a MLproject
file, which is a YAML formatted text file.
conda.yaml is used to share your environment with someone else -for example, so they can re-create a test that you have done. To allow them to quickly reproduce your environment, with all of its packages and versions, give them a copy of your conda.yaml file
. To create a .yaml file of your environment perform the following steps.
- Activate the environment to export:
conda activate myenv
- Export your active environment to a new file:
conda env export > environment.yaml
A project file consists of the following properties:
- Name: It’s the name of the project.
- Entry Point: Any .py or .sh file can be an entry point. By default, entry points do not have any parameters when an
MLproject
file is not included. Parameters can be supplied at runtime via themlflow run
command. You can also specify parameters for them, including data types and default values.
Ready to Run!
To run use the following command:
!mlflow run . -e GAN.py -P learning_rate=0.0001 -P epochs=10 -P noise_dim=100 -P num_examples_to_generate=16 -P buffer_size=202599 -P batch_size=256
Code and Repository
Check out some generated images here: https://thispersondoesnotexist.com/