Stanford Cars Dataset — https://ai.stanford.edu/~jkrause/cars/car_dataset.html

Generative Modeling of the Stanford Cars Dataset — the final project

Published in

The Startup

8 min readJul 2, 2020

In this project I will use the Stanford Cars Dataset available from the Kaggle platform to develop a generative model able to generate images from the input dataset. I decided to run this generative model project for the completion of the course Deep Learning with PyTorch — Zero to GANs, and I thought this to be sensible choice given the relevance of cars for applications of advanced Computer Vision with Deep Learning techniques such as monitoring of transit within cities or self-driving cars of the future!

Image: https://ai.stanford.edu/~jkrause/cars/car_dataset.html

GANs are generative adversarial networks of the type explained during the final lecture of the Course. Hence they are neural networks suited for image processing if they are equipped with a convolutional neural network model for image classification. That is the outline of this project.

*image:* *https://www.mdpi.com/2072-4292/12/7/1149*

Loading the Stanford Cars Dataset

Doing a brief overview of the Stanford Cars Dataset we can see that it contains 16,185 images of 196 classes of cars. The data is split into 8,144 training images and 8,041 testing images, where each class has been split roughly in a 50–50 split. Classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupé. Please feel free to check this link for further details: https://ai.stanford.edu/~jkrause/cars/car_dataset.html

Our dataset has two folders, one called cars_test. The other is the cars_train. We will use this cars_train as our single dataset for the unsupervised learning task of generating images from an input dataset.

The Discriminator Network — training and model predictor

In this project I am developing a Deep Convolutional Generative Adversarial Network (GAN). It is therefore a normal architecture of a discriminator and a Generator, but the model is trained with 2D Convolutions, a function ready to use within the PyTorch collection of libraries. Convolutional Neural Networks are ‘compressors’ of input images or pixels into a single number, vector or tensor. Hence they are state-of-the-art classfication architectures for image processing and machine vision. And they can be combined with GANs. The discriminator is used to classify and distinguish images, or in other words, they compare and determine a sort of ‘distance metric’ between an image within an input set and another image that comes from the other part of the architecture, the generator. The generator is the clever innovative bit of GANs. It literally creates an input set out of nothing and then stores it in ‘memory’, metaphorically.

What followed was he process of designing our model. In the model I decided to follow the standard practice of implementing batch normalization. This is a method which is of help for the training process of the entire architecture, gives stability to the design of the layers in the neural network and speeds-up the whole process. Batch normalization is cleary useful, even if the relative obscure nature and lack of explainability of the technique to be still an issue…

This an architecture of my own making. I’ve chosen to have an input size image of 32x32 and a kernel_size of 4. The nn.sequencial method implements the batch normalization referred to earlier, but introduces a new activation function to these types of architectures: the so called Leaky Rectified Linear Unit, a rectifer which allow a small positive gradient when the unit is inactive:

The final flattened output is input to a sigmoid function method, which outputs a probability value between 0 and 1. In this way, this output number gives us a measure of the probability of the input image being a real one and not a generated or fake. This type of setup is known as a binary classification model.

The Generator Network

The generator network is normally a construct with an input being a vector or a matrix of random numbers. This matrix of random numbers is also usually referred to as a latent tensor. The generator will convert a latent tensor of a certain shape, say for instance (64, 1, 1) into an image tensor of shape 3x32x32. PyTorch provides a class to achieve this desired goal, ConvTranspose2d, which performs a transposed convolution or also called deconvolution.

The activation functions used in this generator network are a normal rectified linear unit ReLu and in the final output layer is used a hyperbolic tangent Tanh activation function..

Training of Discriminator and Generator

As was mentioned earlier, the discriminator is a binary classification model. Therefore it can be trained using a binary cross entropy loss function.

How this training of the discriminator works?

First we expect the discriminator to output 1 if the image comes from the original input Stanford Cars Dataset or 0 if it comes from the generator.
We pass a batch of real images and compute the loss with the F.BinaryCrossEntropy function.
Then we pass a batch of fake images (generated using the generator), pass them into the discriminator, and compute the loss, setting the target labels to 0.
Adjusting the weights of the discriminator, but not the weights of the generator, involves adding the two losses and using the overall loss to compute a normal gradient descent routine.

It is important to keep in mind that we are only ajusting the weights of the discriminator in this framework, as our optimizer_d() only affects the discriminator weights.

Now comes the clever bit of innovative thinking supporting GANs. Our generator model outputs images, in contrast with the discriminator which outputs a single number (or vector, matrix, tensor). But how can we train a model with an algorithmic routine over images…?! We need to input some sort of transformed image to pixels or any other trick in order for the optimization training routines to perform efficiently. The trick in GANs is using the discriminator model as a component of the loss function for the generator.

First we generate a batch of images from the generator network and pass it to the discriminator.
Next we compute the loss as if from the discriminator, but setting targets labes to 1. This makes sense since the generator tries to fool the discriminator
Finally we use the loss to perform gradient descent i.e. change the weights of the generator, so it gets better at generating real-like images to “fool” the discriminator.

The Training Loop

In this section we will define the training loop for both the discriminator and generator models. This is done with a fit function for the discriminator and the generator to both be trained simultaneously for each batch of training data.

Next we create our optimizers. We will use the Adam optimizers. But this project will be keeping open to experiment with other optimizers. This is the space for creative experimentation, as well as the space for tuning of hyperparameters.

Now we are ready to train the model. It is a question of trying several learning rates, number of epochs or optimizers. After this process is completed the model can start generating images and understand the behavior of both the real predictions from the discriminator, the discriminator and generator losses and the images generated ‘out of nothing’ from the generator. These images can then be logged in the Jovian platform and used to generate a video of sampled images.

Issues with the full training loop

At this precise moment of this project I am still getting some issues with the full training loop and the fit function. I think the suboutine of the fit finction, the losses and the saving of the images generated by the generator is running some bug I could not yet identify. There is also the possibility of the overall GAN architecture size I’ve chosenm to not be suitable fo this Stanford Cars Dataset. This isseu prevents ne to showing how the images generated by the generator evolved during training, and also the plots for the losses of both the discriminator and the generator and the respective scores shows blanck plots, indicating the there were no real training… This is strange, since I have an intuition the the generator really did perform its job of generating fake_images… Maybe the trouble is with the discriminator… Anyway I will again do a submission of this final result. All in all I already have had a productive experience going through all the procedure forthis project, and the writing of both this blog and the entire notebook at the Kaggle platform, all by myself, something which is I am somewhat already experienced with, but with real code experiment, with PyTorch and within a peer-to-peer course project was one of the few ones I’ve done.

Conclusions

In this project I managed to go through all the revelant steps of building a real like experience of setting up a Data Science/Machine Learning algorithm design with a cutting edge software library for deep learning — PyTorch. This was an important expereince to me. I have already a somewhat of an experience with analysing Machine Learning/Deep Learning research and research papers with formal code supporting it. But this is the first real experience for me with a data science platform where I had to load and prepare the data, I had to check whether the code for the goal in mind was the right one and I had to experiment with my own imagination and will what was best architecture to pursue the desired goal. That in itself is already very valuable experience, indeed!

Everything went right until the very last parts of this process, when I stumbled upon issues with the full training loop subroutines. As was mentioned above, I have some intuitions as to what might be the problem, but as of this moment I could not yet figure it out entirely. I will just upload below my final notebook I save into the Jovian.ML platform, and I hope to be enough as a vindication of the work I’ve done and all I’ve learned!