Mars Rover Image Classification
Abstract
In this article, we will discuss how to complete a multi-class image classification task using Convolutional Neural Networks (CNN) with the Mars rover dataset provided by NASA. Through this discussion, we will also demonstrate how we achieved 98% accuracy using TensorFlow.
Introduction
In recent years, we’ve seen fast advancement in applications utilizing convolutional neural networks (CNN), especially in image classification, speech recognition, and natural language processing (NLP). It is widely adopted for its high accuracy and computational efficiency. In this article, we will define convolutional neural networks, explain how to apply a CNN to tackle a multi-class image classification problem using the NASA Mars Curiosity Rover image dataset, outline some of the issues we encountered during the process — such as class imbalance, that we dealt with using image augmentation.
Deep Learning
Deep Learning is a method of machine learning which mimics the abstraction of the human brain and represents it within artificial neuronal networks. Through this, we can train a model to learn to differentiate between different images and data, similar to neural circuits in the brain.
In deep learning image processing, we first convert images into numerical values to model the RGB (0,256) pixel value. Each pixel input is then fed into a neural network, computed, and interpreted. Whenever a new training image is fed into the network, the image is broken down into the same pixel format and analyzed. The network then compares the numerical values and patterns, and assigns a softmax confidence score of what image class it is.
Convolutional Neural Networks
Convolutional Neural Networks are a form of neural network where pixels are scaled down into small tiles called filters, and then computed. The model then takes the computations of all the filters and determines an output, or prediction. The benefits of CNN’s are mainly the preservation of shapes, which in turn, leads to faster and more efficient processing and parallel computing times.
Dataset
We used the Mars rover dataset from the NASA archives. The images were taken over a span of 3 years from 2013 to 2016, and are from the Curiosity Rover. The dataset contains hundreds of different images of the ground, horizon, and rover- containing 24 classes. The main problem we encountered with this dataset is large class imbalance, which led to a tedious and challenging preprocessing step.
EDA
Process
Step 1: Preprocessing + Show Code
Step 2: Building baseline model
We will be utilizing a baseline model using the ResNet50 transfer learning framework. The framework requires a [256,256,3] image input so the images need to adopt that form. We set the weights as none and include_top as false. Next, we utilize 2D global pooling to scale the images down and set a Dense input layer of 128 with a ReLU activation function, and an output layer of 24 with a softmax activation. Lastly, to compile the model, we utilize the Adam optimizer and sparse categorical cross entropy as our loss function since the inputs are integers and not one-hot-encoded transformations.
As you can see, our baseline model performs with an accuracy of 0.2158. The confusion matrix is shown below.
Not surprisingly, our baseline model performs poorly. To improve our accuracy, we will modify our dataset as well as adopt pre-trained weights.
Step 3: Eliminating under-represented classes
We noticed that many classes in our dataset are under-represented with less than 80 samples. We decided to remove those image classes since not enough data is present for our prediction task. After eliminating the under-represented classes, we are left with 14 image classes.
Step 4: Transfer Learning + Image Augmentation (Show Code)
Once we eliminate the underrepresented classes from the data, we are ready to use transfer learning on our updated data. But first, we use image augmentation to balance our dataset and include more noise and variance so the model can learn better. We will use rotate, flip, zoom, and rescale as added data points. Now, for our model, we will use the same architecture as our ResNet baseline model but with ImageNet trained weights.
Step 5: Show Model + Confusion Matrix
As you can see, after dropping under-represented classes, utilizing pre-trained weights, and adopting image augmentation to alleviate class-imbalance, we achieve a much higher test accuracy of 97.5%. From the confusion matrix, we observe that image class 12, “portion box” is generally misclassified as class 2, which is “scoop.”
Conclusion
Upon our EDA of the NasaMars, we noticed that the classes were very imbalanced. As for our initial base model, we would use a baseline ResNet 50 model with no pre-trained weights. This leads to very poor predictions. In order to improve our model accuracy, we adopted three main approaches. First, we dropped classes with lower representation (<80). This leads to 14 remaining classes. Then, we applied image augmentation, using horizontal flips, random rotation, and random zoom to counter class imbalance in the remaining data. This would have a positive effect on our accuracy. Lastly, we applied pre-trained weights to our model and added a dropout layer to reduce overfitting. By implementing such additions to our initial model, we would eventually be able to reach a satisfactory accuracy.
Written by Raymond Wang, Wenbo Hu, Sahil Shah, and Tyler Ngo