Getting Started With Kaggle Digit Recognizer Competition

What’s this competition all about???

MNIST (“Modified National Institute of Standards and Technology”) is the de facto “Hello World” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

In the Digit Reconizer competition hosted by Kaggle, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images. Link to the Competition: https://www.kaggle.com/c/digit-recognizer.

Currently, I reside on 41st position on the public leaderboard for the competition among the top 2% of contestants. It took me about a year to achieve the score of 1.00000 on this competition and I thought that it would be the best time to share my experience with the competition.

How to start???

The first approach that I could think of was the classic old LeNet-5.

Originally proposed by Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner as a convolutional neural network (CNN) architecture for handwritten and machine-printed character recognition in 1990’s, it was used on large scale to automatically classify hand-written digits on bank cheques in the United States. Nowadays CNNs serve as the state-of-the art deep learning-based computer vision algorithms. These networks are built upon 3 main ideas: local receptive fields, shared weights and spacial sub-sampling. Local receptive fields with shared weights are the essence of the convolutional layer and most architectures described below use convolutional layers in one form or another.

LeNet-5 Architecture

When I originally submitted my first submission, I had implemented LeNet-5 using only Numpy in Python. The code was quite messy and I did not want to reproduce it. That’s why I recreated the solution using Tensorflow with the help of my friend Sohom Dey. The notebook for the solution can be found at

So let’s get started with the actual code.

On With the Code….

We did not use a wide range of libraries other that Numpy, Pandas, Matplotlib and Tensorflow which are available by default with the Kaggle Kernels environment.

Output:

['train.csv', 'sample_submission.csv', 'test.csv']

We will now implement a couple of utility functions that will be used both in the model training and visualization phase.

Output:

Exploring the data

Output:

Number of images in training dataset: 42000
Number of pixels in each image in training dataset: 784
Number of images in test dataset: 28000
Number of pixels in each image in test dataset: 784

Preprocessing the Data

The images in the dataset have been given in 28 x 28 resolution. Since the LeNet-5 Architecture requires 32 x 32 images, we would convert the given images into 32 x 32 by applying extra zero padding on the images.

Output:

((42000, 1024), (28000, 1024))

We also need to convert the target labels into their respective One-Hot Encoded format.

Output:

Shape of Training Labels: (42000,)
Shape of y_train after encoding: (42000, 10)

Building LeNet-5

We will first declare the training parameters and hyperparameters for the Neural Network.

We will start building the model by creating the placeholders. Placeholders are a type of Tensorflow objects that are not inintialized with any value, rather receive their value during execution of the Tensorflow graph inside a Tensorflow Session by a feed dictionary. The placeholders that we will declare will correspond to the images and the one hot encoded training labels.

We will not declare and initialize the Weights and Biases corresponding to each layer of the network that will be optimized during the training of the model.

Now we will be building the actual model. We will create two utility functions for 2D Convolution and 2D Maxpooling with Valid padding. Then we will create the layers of the neural network.

Now we will build the Tensorflow Graph for all of our operations. We will start by declaring the logits as out model, the loss operation (Softmax Cross Entropy with Logits), Optimizer (Adam optimizer). Then we will create the training operation that will be using the Optimizer to minimize the loss. We also create an operation to calculate the model accuracy and a Global Variable Initializer that initializes the global variables which in this case are the weights and biases.

Trainig LeNet-5

We will train the network inside a Tensorflow Session.

Output:

Epoch 500, Cost: 28595476.4296875, Accuracy: 73.4375 %
Epoch 1000, Cost: 5947898.984375, Accuracy: 86.71875 %
Epoch 1500, Cost: 7127918.71875, Accuracy: 88.28125 %
Epoch 2000, Cost: 3046355.96875, Accuracy: 92.96875 %
Epoch 2500, Cost: 3755678.59375, Accuracy: 93.75 %
Epoch 3000, Cost: 1928981.6875, Accuracy: 92.1875 %
Epoch 3500, Cost: 769532.8125, Accuracy: 96.875 %
Epoch 4000, Cost: 1833259.3125, Accuracy: 93.75 %
Epoch 4500, Cost: 1317497.5, Accuracy: 96.09375 %
Epoch 5000, Cost: 1188782.34375, Accuracy: 93.75 %
Epoch 5500, Cost: 267834.515625, Accuracy: 98.4375 %
Epoch 6000, Cost: 1112221.875, Accuracy: 96.09375 %
Epoch 6500, Cost: 467607.857421875, Accuracy: 94.53125 %
Epoch 7000, Cost: 400827.03125, Accuracy: 97.65625 %
Epoch 7500, Cost: 22324.25, Accuracy: 99.21875 %
Epoch 8000, Cost: 394928.5625, Accuracy: 98.4375 %
Epoch 8500, Cost: 71348.0625, Accuracy: 99.21875 %
Epoch 9000, Cost: 0.0, Accuracy: 100.0 %
Epoch 9500, Cost: 24381.53125, Accuracy: 99.21875 %
Epoch 10000, Cost: 10489.375, Accuracy: 98.4375 %
----------------------------------------------------------------------

Optimization Finished

Accuracy on Training Data: 98.4071433544159 %

Let’s visualize the training history, that is, the change is loss and accuracy during each epoch.

Output:

Output:

Making Predictions:

In order to get the predictions, we will have to first reinitialize the weights and bias variables using their optimized values. Then we would forward propagation through the neural network using these optimum weights and biases learnt during training.

Output:

array([2, 0, 9, ..., 3, 9, 2])

Making Kaggle Submission

We need to first save our predictions for the test images in the format given in the sample submission file which we will upload as our submission.

This solution, being one of my earliest does not fetch a very good position on the leaderboard. Given the current scenario on the competition leaderboard, you might get which is around top 80%. Still, I think LeNet-5 is a good way to get started with the Digit Recognizer Competition in particular and Convolutional Neural Networks in general.

In my subsequent articles I will share better techniques that hepls you achieve much better positions on the leaderboard. So, stay tuned for more articles :)