Convolutional Neural Networks

Tathagat Dasgupta
the ML blog
Published in
5 min readFeb 20, 2018

Convolutional neural networks are a sub-category of neural networks that have proved to be extremely effective in arduous tasks such as image recognition and classification. The first CNN was built by Yann Le Cunn: LeNet5, but the real interest in CNN was sparked off after the ImageNet success of Alex Krizhevsky who was able to bring down the classification error to a record 15% from 26%.

Show your support by subscribing to our newsletter!

In this article I will be walking you through the steps of using a CNN on the MNIST digit classification task.

A CNN is made up of four hidden layers namely:

  1. convolution layers
  2. max pooling layers
  3. fully-connected layers
  4. dropout layers

I will explain the role and the concept behind each layer as we progress through the code.

The CNN we are going to implement today is the model developed by Alex Krizhevsky and , also named after him -AlexNet.

This model uses Lasagne with a Theano or Tensorflow backend to train the neural network. So we will be working with these two libraries extensively, also, scikit learn to visualize our prediction information.

The MNIST dataset can be procured using the urllib module and using pickle to load the files into our model. We define a function load_dataset() :

We can see the image data from our dataset using the matplotlib module:

plt.imshow(X_train[0][0], cmap=cm.binary)

The output we get is the following:

Visualizing Dataset

Next we have to create the structure of our neural network i.e. the layers, but before doing that, we have to define some variables which are essential for the functioning of our layers:

The batch_size refers to the number of training examples in each batch for our batch gradient descent. The output_size is set to 10 which indicates the number of classes we predict in the output layer i.e. 10 digits in mnist. The data_size is the shape or the dimensions of the input data i.e. our data is represented by a three-dimensional block of size 28x28 and as our image set is black and white the third value or RBG channel is set to 1. The input_var and output_var are Tensortype object which will hold the input and output data. Finally, net is the name of our model which is a simple python dictionary.

This is the structure of our neural net-AlexNet:

The input layer simply takes the input data and its shape as arguments.

The first layer is a convolution layer which has a filter i.e. a 2D matrix of fixed size F which shifts throughout the input matrix N with a specific stride S, producing an output of size- (N-F+2P)/S+1.

Next, Pooling layers reduce the spatial size of the output by replacing values in the kernel by a function of those values i.e this layer shrinks the image size. Here we will be using the lasagne.layers.Pool2dLayer(). A max-pool layer divides the matrix up into pools creates a yet smaller dimensional matrix containing the maximum value from each pool(using the same filter method as discussed above).

The fully-connected layers are hidden layers where each input neuron is linked to all neurons in the next layer i.e all neurons are connected.

The dropout layer is positioned just before the output layer. Dropout sets a proportion 1-dropout of activations (neuron outputs) passed on to the next layer to zero. The zeroed-out outputs are chosen randomly.

  • What happens if we set the dropout parameter to 0?

This reduces overfitting by checking that the network can provide the right output even if some activations are dropped out.

Finally, our output layer is a softmax layer. In probability theory, the output of the softmax function can be used to represent a categorical distribution— that is, a probability distribution over n different possible outcomes. As we have 10 classes or 10 handwritten digits we use the softmax layer with ten neurons.

Keep in the mind the filter_size for each convolution layer has to be set carefully, otherwise the shape of the data may become negative due to excessive convoluting(Do the math by hand!).

To better understand the various layers and layer arguments check out-http://lasagne.readthedocs.io/en/latest/modules/layers.html

Next, to train the neural network we need to set some kind of update rule for our model. Hence, the first step is to define the loss function or the cost function i.e we will use the mean cross-entropy function.

As you can see we have used L2 regularization technique to reduce the noise in our network. Now, the second step is to define the update rule. Stochastic Gradient Descent is one of the most widely used and effective weight update functions. I have used a variant of the SGD called Adam which is an optimized version of SGD.

Followed by this, we will define the Theano functions which will train and test the model. We can extract the various inferences such as loss and accuracy and test error,etc. using the theano layer fucntions.

The theano layer functions enable us to gather inferences from within the layers of a neural network i.e hidden layer representations which are very useful if you are aiming at extracting features from an image set.

Next we simply train the model over our training data using mini-batches of size 100( Remember? batch_size=100) and test our model to display the test error:

I ran the program on my humble CPU (will post the results on GPU later) and got the following output:

Test Error: 0.01470 Time:4134.223 seconds

Thus we got 98.53% accuracy on our test set. The model can be applied to larger datasets and real-world image processing applications.

Till next time!

--

--

Tathagat Dasgupta
the ML blog

Associate Consultant — Data Science at Infosys | Ex-Lead Data Scientist at Senquire Analytics | UC Irvine graduate