An End to End Application of Deep Learning Models using PyTorch
As part of my learning skills, I have decided to learn PyTorch and the application of deep learning models using PyTorch. Here is when I had fortunately met Aakash and his team from Jovian.ml where they were teaching a course for free on PyTorch through online. The course was named as “Deep Learning with PyTorch: Zero to GANs”. During the 6-week-long course, we were trained from basic fundamentals and Feed forward neural networks to CNNs, transfer learning models and GANs. As part of the final course project, we were asked to choose any public dataset and apply all the learnings we learned. So I have decided to participate in a live hackathon conducted by AnalyticsVidhya. The competition link is HERE.
Fatalities due to traffic delays of emergency vehicles such as ambulance & fire brigade is a huge problem. In daily life, we often see that emergency vehicles face difficulty in passing through traffic. So differentiating a vehicle into an emergency and non emergency category can be an important component in traffic monitoring as well as self drive car systems as reaching on time to their destination is critical for these services.
Develop a model on classifying vehicle images as either belonging to the emergency vehicle or non-emergency vehicle category.
The data we have is image data and we have a binary classification problem here.
- train.zip: contains 2 csvs and 1 folder containing image data
a) train.csv — [‘image_names’, ‘emergency_or_not’] contains the image name and correct class for 1646 (70%) train images
b)images — contains 2352 images for both train and test sets
2. test.csv: [‘image_names’] contains just the image names for the 706 (30%) test images
3. sample_submission.csv: [‘image_names’,’emergency_or_not’] contains the exact format for a valid submission (1 — For Emergency Vehicle, 0 — For Non Emergency Vehicle)
The evaluation metric for this competition is Accuracy.
Let us check each class counts.
We do not observe any class imbalance.
Some example images for each category class are plotted using matplotlib.
The images shown above are with class category “Emergency Vehicle”.
The images with class category “Non-Emergency Vehicles” are show below.
A modular class object is created where the task is to read the train csv file which has images list and its labels and applies any transformation if necessary.
This way we create two datasets for each of the train and test data as shown below.
Training and Validation Dataset
The training and validation dataset is split with 80:20 ratio and with a batch size of 32.
Once we have all the data preparation is done for each of the train/validation/test. Next we define the class “Net” as network model where we pass the CNN layers followed by some of the regularization layers like drop out, BatchNorm etc.
This is the base network class where we infer to when we call the model training.
Here we are solving a binary classification and hence the loss function is CrossEntropyLoss as we are producing the 2 class output. The optimizer used is Adam with a learning rate = 0.0001.
The evaluation metric for this competition is “accuracy”. Hence we define this function as:
The entire training of the model is done with the help of the below code where we define the parameters such as no of epochs and measure loss value at each epoch. Here we have an option to train the model on GPU but in my case here I have less data size and I have not used any pre-trained models or the concepts of transfer learning. Hence I have trained the model only on CPU
Once we have the trained models ready, let us look at the loss and accuracy values for both the train and validation dataset.
Let us see the classification of images done on the validation dataset through visualization.
Prediction on Test Dataset
The accuracy on the final test set shows ~58%. You can find my work until application of simple ConvNets in this notebook. Click HERE.
Now to further increase the accuracy I have used the transfer learning concepts using pre-trained models (ResNet18, ResNet34).
We load these models as shown below.
We define the neural network layers with input features as 512 and 2 output classes.
I trained the model using ResNet18/ResNet34 architecture with optimizer — Adam, Epochs — 10 , learning rate — 0.001 , Criterion/loss function — CrossEntropyLoss. Here we need to make sure we run this model using GPU on, which helps to have far less run/execution time as compared to CPU.
We can clearly see that for few epochs trained, accuracy has increased to more than 90%. Finally when we ran the trained model on the test dataset — the private leaderboard showed an accuracy of ~95%. That was an awesome increase in score.
Save and load models
Once we have final models ready — we can save the models.
My work on application of pre-trained models is shown in this notebook. Click HERE.
Image classification problems are basically trained using ConvNet and produces decent performance metric, but to increase the performance metric a bit further and be competitive — pre-trained models application/training is the GOTO strategy and we have proved here by using the ResNet architectures as compared to custom ConvNet.
Major learnings for me on this project which are helpful for accurate predictions are:
- Identifying the right architecture
- Identifying the hyperparameters (No of layers, batch size, epochs, learning rate with weight decay
- Concept of gradient clipping to stop problems of vanishing and explosion of gradients
- Fine-tuning the model — It is possible to fine-tune all the layers of the ConvNet, or it can also possible to keep some of the earlier layers fixed (due to over-fitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original data.
- Using GPUs for training larger datasets which results in faster run/execution time
- For performance improvement — ensemble models with multiple network architectures will be helpful
Above all being said — the toughest part is to get all the parameters/networks to get it right and this improves once we make our hands dirty in application of theory into practice which I highly recommend everyone for any future application. With many advancements happening in the NLP area Computer vision is also showing a lot of promise and reshaping the future of various industries such as automobile industry, healthcare industry, and financial industry.
Anyway, I am hopeful that this this article helps you understand some concepts of image classification problems using PyTorch.