Classification on CIFAR-10

The AI Guy
Nerd For Tech
Published in
8 min readJun 6, 2021

--

Earlier we have discussed about Deep Learning and neural networks in my previous blogs, we are on the path to master this algorithm, firstly we understood the intuition, then the math behind it, now finally we are going to see its implementation through Python.

An overview how our implementation process is generally carried forward.

Source : www.canva.com

Understanding the Data

For understanding implementation of neural network, we will first have to see for what type of problem we are using this and on what type of data are we actually using our network.

This CIFAR-10 dataset is a collection of different images and is a very basic and popular dataset for Machine Learning and Computer Vision practice. The CIFAR-10 dataset contains 60,000 (32x32) color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class. So, we are planning to implement a network for classifying an test image into these 10 classes.

Source : CNN VS. CIFAR-10 by jesse_geman

These images in CIFAR-10 dataset are of (32 x 32) resolution and color images, which means they are in RGB format. So, every image is of a shape (32,32,3) where 3 represent its number of channels-RGB, RED, GREEN and BLUE. Every image in this dataset is a mixture of these 3 color images. All these images are in form of pixels, like in this particular data 32 x 32, means a matrix of 32 x 32 pixel values for 3 different channels.

Source : geeksforgeekss matlab-rgb-image-representation

So, we get to a conclusion that the data we are going to use for our Neural Network Implementation has 60,000 images of (32,32,3) shape and 10 Output classes to be classified into. Now, we’ll see this implementation in python.

Required Libraries

A Python library is a reusable chunk of code that you may want to include in your programs/ projects. Compared to languages like C++ or C, a Python libraries do not pertain to any specific context in Python. So for performing mighty tasks we need to take use of some highly compatible and easy to use libraries. The libraries we would be requiring for executing this problem are listed below.

Load Data

Our first step for any ML or DL problem should always be loading and visualizing the Data which we aim to use and make predictions on. Here we have loaded the cifar10 dataset which we imported from tensorflow.datasets in the previous steps, also we have loaded the data directly into training and testing sets, so further external splitting is not required.

Visualize Input Data

Here we are just visualizing the input data, in the code snippet below we are making a subplot for plotting the first 10 images in the training data.

Subplot created for images

Normalizing the Data

Now, after just loading the dataset we have to normalize it for uniform features and better predictions. Here we are also changing the type of data into “flot32” type, this is done because tensorflow takes input only in this format.

Building the model

Now comes the most important part of our program, building a model for fulfilling our aim of classification into 10 different classes in this problem. Because in this type of problem there in a single input and a single output, we can add different layers and proceed to our classification in a defined sequence, here we don’t need to give multiple, separate inputs or outputs, so the complete model can work sequentially. In this example model a total of 9 layers such as Input, Conv2D, MaxPooling2D, Flatten, Dense . Every layer has its own hyper-parameters, I’ll try to cover them deeply in coming blogs, but I would suggest you guys to the go on to the official tensorflow documentation page:

Go through this for a better understanding of all the layers and there different Hyper-Parameters, this will help you guys very much and will surely expand your thinking and knowledge related to the particular layer. Coming back to our example model you can see that in the last layer of our model we have used a Dense layer with an argument given as 10 into it, this means that the output size after passing information through that layer will be in the format of (10,). We have to classify into 10 classes so after passing through this layer it creates a vector of length 10 which gives 10 different values which is passed through a Softmax function that converts each value of the vector into probability, so it becomes easy for us to choose the highest probability and predict the particular index class. Also at end ‘model.summary()’ is used to print a complete summary of the build model.

This particular part of code is used to plot the made model (layer-by-layer) in a very effective way, also this saves the plotted model into a file of the mentioned format (ex: .png, .jpg) in the same directory folder you working in which can be used later for model visualization.

Model

Compile Model

We have build the model for the classification, now next step would be model.compile function where we could mention the different metrices, loss, optimizers to be used while training the data on our model. Here I have used SparseCategoricalCrossentropy loss and importantly I have given (from_logits = True) as an crucial argument, this is given True because we have not given Softmax as activation_function for the last layer in our model. So, this argument tells that the output from last layer in not in probability format, so first convert it and then proceed further. Also, we have used an optimizer argument so, Optimizers basically are algorithms or methods used to change the attributes of Neural Network such as Weights. At last initialized the accuracy metrics to keep a check on the accuracy values.

Training Model

Now we have to train our model, for doing so tensorflow has function named model.fit, which is responsible for the training and has some crucial arguments as input which including the training data, batch size and epochs. The batch size is a hyper-parameter that defines the number of samples to work through before updating the internal model parameters whereas the number of epochs is a hyper-parameter that defines the number times the learning algorithm will work through the entire training dataset. So this will train our model with the input data for mentioned epochs and returns loss and accuracy values after each epoch. We have stored it in a variable called model_history, so that we can access our accuracy and loss values afterwards for visualizing them.

Visualizing

This is the most important and fascinating part, after implementing every snippet you will be able to understand how it actually works but visualization makes everything simpler to understand. Remember we saved the trained model in model_history, we did that because now we will visualize those results(accuracy and loss) foe each epoch. For Visualization Matplotlib is a very efficient and useful library.

Model accuracy graph for Training data

Similarly we can also Visualize the loss generated after each epoch.

Loss on Training Data

We have implemented a small model on this CIFAR10 dataset by which we got an maximum accuracy of 70% on the training data which is not a great result, but for understanding the implementation of Neural Networks it is acceptable. There are lot more concepts which can be included while making the model such as implementing callbacks, using dataset api, visualizing output after each layer but I have not included all that in this blog because I wanted to keep this one the basic implementation of Neural Network but will surely come up with some exciting blog which will cover all these too.

Making Predictions

We have implemented the complete model and visualized the accuracy and loss too. If we have made a model to classify images into 10 classes, we should also see how can we predict for a random input image.

Here we are trying to predict the output label for the random input x_test[1], which is in the test dataset. Here model.predict function is used to get the output by passing this image through our network and give the final array as output. Then ‘argmax’ function is used to get the maximum value out from the array and store it as n. This n is the index of the label predicted which we calculate from the list created for the labels.

Conclusion

In this blog I tried to cover the entire implementation of a basic Neural Network and visualized Accuracy and loss for the model on training data. At the end we also predicted for a random input given to the model and our model performed accurate and gave the exact prediction. When we learn concepts of Deep Learning we have to dig deep down in it to completely understand. This was a very elementary level model which I used for this particular blog to make you understand the maximum. In the coming blogs I will try to cover some complex architecture for the Network and see ways to improve our accuracy and understand them. Also I am providing the link to access this code completely so that you can re-implement and understand it properly.

Hope I was able to deliver something valuable, please do leave your responses and Happy Learning !!

--

--