Classifying CIFAR-10 using a simple CNN

Afshin Shahrestani
Analytics Vidhya
Published in
8 min readAug 9, 2021

In this article we will discuss in simple terms what deep learning is, what convolutional neural networks (CNNs) are , and how we can make a simple CNN model. This article assumes that you have some basic knowledge of AI, Machine Learning, and Python programming.

What is Deep Learning?

Deep Learning is one of the subsets of Machine Learning and Artificial Intelligence. Deep Learning methods try to imitate the function of the human brain in processing data and finding patterns in it. Deep Learning can be both used for supervised and unsupervised learning. Deep Learning uses artificial neural networks, designed with the functionality of neurons in the brain in mind, to carry out different tasks. Deep Learning methods are used in many different areas, such as image classification, text generation, and weather forecasting.

In this article we will be creating a convolutional neural network to classify some images.

Convolutional Neural Network:

Source

Convolutional Neural Networks or CNNs are a type of Deep Learning method usually used for image classification and feature extraction. To understand how CNNs work we first need to understand some concepts:

  • Image Data
  • Convolutional Layers
  • Pooling Layers
  • Dense Layers

Image Data: Each image consists of 3 components: height, width, channels. The number of channels represent the depth of the image and correlate to the colors used in creating the image. For example, RGB images have 3 channels, one for each primary color used to create it. So, for each pixel, we have 3 (the number of channels) values each between 0 and 255.

3 Channel Image

Dense Layers: Dense layers are fully connected layers in neural networks. Each neuron in the dense layer receives information from all neurons in the previous layer. Dense layers are the most commonly used layers in neural networks. The output of a dense layer with M neurons is an M dimensional vector. This type of layer is usually used at the end of the neural network to determine which class the image belongs to.

Convolutional Layers: Each convolutional neural network consists of one or more convolutional layers. These layers are the main building component of CNNs and are tasked with finding patterns in the images that can be used for image classification. While dense layers are used to find features globally on the image, Convolutional layers detect patterns locally. When we have a densely connected layer each node in that layer sees all the data from the previous layer. This means that this layer is looking at all the information and is only capable of analyzing the data in a global capacity. Convolutional layers use filters to achieve this local pattern detection. Each Convolutional layer consists of several filters with the same size, each looking for different information inside the image. A filter is a m x n pattern of pixels that we are looking for in an image. The output of the Convolutional layer will have a depth equal to the number of filters used in that layer. If you want to know more about filters and how they work you can take a look at this article.

An example of a convolutional layer filter

Pooling Layers: Pooling layers are used to downsample the output of the convolutional layers and reduce its dimensions. There are 3 types of pooling: Max Pooling, Min Pooling, Average Pooling. Pooling is usually done using a 2x2 window with a stride of 2, which reduces the size of the output by 2x.

Now that we are familiar with what the building components of a convolutional neural network are, we can discuss its architecture and how we can create one.

Creating the model:

We will be using Keras and Tensorflow in this article to create our very own image classifier. This image classifier is going to classify the images in the Cifar Image Dataset into one of the 10 available classes. This dataset includes 60000 32x32 images with each class having 6000 images.The labels for these classes are: Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck.

As mentioned above, we are going to use Keras and Tensorflow to create our model. You can either install Keras and Tensorflow into your own device or you can use platforms such as Google Colaboratory to write and run the code for your model.

If you are using a Google Colab notebook, like I am, the first thing you must do is to enable hardware acceleration with GPU inside your notebook. This allows your model to use GPU for the training and testing phase which speeds up the process by a huge amount. To activate hardware acceleration, click on the Edit menu on the top of your screen, then click on Notebook Settings, and in the new window choose GPU as the hardware accelerator. If you previously ran anything in the notebook, you need to reset and rerun the code.

To check whether Tensorflow is connected to the GPU or not, you can run the code snippet below in your notebook.

Now we are going to load our dataset. To do so, we are going to use the Keras API to load the Cifar-10 dataset.

By running the code above, we have downloaded the Cifar-10 dataset and split it into the training and testing segments. The training segment is used for training the model, while the testing portion of the data is used to evaluate the accuracy of our model. To see what is actually inside the dataset you can run the code below to plot some of the images in the dataset.

You should get an output like this showing pictures of the data in each class.

Since the images in the Cifar-10 dataset are 32x32, the output images are not high quality. One more thing we better do before creating the model and giving the dataset as an input to it, is to normalize the images’ pixel values between 0 and 1. We are doing this so that during the backpropagation phase we don’t over/under compensate in correcting the weights of neurons.

With the dataset ready for processing, we are going to create a simple CNN from scratch to process this data. As mentioned before, we will be using Keras to create this model. Keras acts as an interface to Tensorflow machine learning library, making the development of models much faster and simpler. What we will be creating here using Keras, is a sequential model. Sequential models are basically a linear stack of layers. First we import layers and models from the keras library. Then we create a new sequential model and define the different layers inside the model.

Let’s see what each part of the code does. As you can see, we created a Sequential model and added several layers to it. The first added layer is a 2D convolutional layer. The first argument inside the layer is the number of filters we want this convolutional layer to have and the next argument, (3,3), defines the size of these filters. input_shape is only used for the first layer in the sequential model and defines the shape of the input data, which in this case is a 3-channel 32x32 image. Lastly, we have the activation function for the layer. Activation functions are used for determining the output of each node or neuron inside a layer. To understand what activation functions do and what different activation functions there are, you can read this article. Hare, we will be using ReLU as this layer’s activation function.

Next we have a pooling layer to reduce the size and the dimensions of the convolutional layers output. We will be using a MaxPooling2D layer with the size of 2x2 that picks the largest number in each 2x2 window of its input. We have 2 more convolutional layers and 1 more pooling layer in this model. The output of the last convolutional layer is1x1x64. We will flatten this output into a 64 value 1D input for the next dense layer after it. The last layer we have is another dense layer with the size 10. This layer basically is the classifier of the model that determines which one of the 10 classes the input image belongs to. By using the model.summary() line we can see the architecture of the model we just created.

You can see what each layer does to the shape of the data and how many trainable parameters we have in each layer. Now that we have defined the architecture of our model, we need to compile and run it. To compile the model we run the line of code below.

Optimizers are methods used to minimize an error loss function in our model. Optimizers help to know how to change weights and learning rate of neural networks to reduce the losses. There are many activation functions, with the most commonly used one being “Adam”. We use Sparse Categorical Cross Entropy as our loss function here. We also choose a list of metrics to be evaluated by the model during training and testing. Our model is ready. All we have to do now is to run it on our training and test datasets and see how good it classifies the images.

In this function, epochs determine how many times we want the whole dataset to be passed through the model. Too many epochs can cause overfitting while too little epochs might not let the model achieve its best accuracy. You should be getting around 70% accuracy which is pretty decent for a simple model like this.

You can change the activation functions in the layers, the optimizer and number of epochs to see what happens to the end result.

I hope this article has helped you learn and understand more about these concepts.

--

--

Afshin Shahrestani
Analytics Vidhya

I'm a university student, interested in Machine Learning, Deep Learning and Data Mining, here to share the little knowledge that I have with fellow 9 year olds.