My Journey to Top 10 of the Digit Recognition Challenge on Kaggle

Published in

Skilled Roots

7 min readJul 7, 2019

The field of Deep Learning & Computer Vision is progressing with leaps and bounds. Google and Facebook’s image recognition algorithms have been already beating humans for a couple of years now. To add to that now they can identify someone even if the face is partially covered.

The field of image recognition had generated interest among scientists and researchers as early as the 1960s. However, the problem was that the systems of that era even the best of supercomputers weren’t powerful enough.

Researchers had realized early on that for a good image recognition machine they needed to build larger and deeper neural networks. But bigger and deeper the neural network, lot more resources it requires for both storage as well as computation.

However, things changed from the 2000s onward as our processors became faster and memory cheaper and larger. Things started to really start looking better when companies like Google/Alphabet got involved in developing their own models that can beat humans at the task of Image Recognition.

In 2015, Google released Tensorflow which is an open source library for data-flow programming and can work seamlessly on both CPU(s) & GPS(s). The release of Tensorflow library made this extremely powerful tool available to all and since then the domain of Image Recognition has been on the rise.

There were other libraries released such as MXnet, Keras, PyTorch, which have now given everybody a lot of options to train and test their deep learning models.

My Journey:

Last year in July I ended up with a fractured knee, couldn’t go to the office and no office laptop to work on. I was getting bored to hell and decided to learn something from Kaggle, on which I had been for over a year, seriously.

I looked at the Knowledge competitions and one caught my eye. It was Hand Written Digit Recognition competition. It was completely beyond my expertise(or lack there off). So I decided that in the 3–4 weeks which I have, with limited mobility, I will post a decent score on the Kaggle Leaderboard.

I downloaded the data-sets and started on this perilous journey of getting acquainted with Image Recognition.

Before we begin here are my PC specs -

This is not my PC(If somebody was wondering)

Core i5 6600k overclocked to 4.4 GHz, 16GB ddr4 ram, two gtx 980tis(Now I have upgraded to 1080tis for lower training times), RStudio with RBase with whatever latest version that was available. I will be posting my code as well for the steps that I followed, so that if you want then you can go ahead and run them.

So let’s begin -

Loading the data sets:

Once you have registered and accepted the terms and conditions for the condition of the competition then you can download the datasets. The data sets are in csv format. Each row has a ‘Label’ and info of 784 pixels(grayscale). Each image is hence the size 28*28 pixels. Our aim is to predict the ‘Label’ using the pixel information. To load the data sets -

train = read.csv("E:/Documents/R/KaggleData/Digit Recognizer/train.csv",stringsAsFactors = F) test = read.csv("E:/Documents/R/KaggleData/Digit Recognizer/test.csv",stringsAsFactors = F)

Plotting the data:

We can plot the image from the pixel information and see a few of them and compare against their labels.

par(mfrow=c(4,3),pty='s',mar=c(1,1,1,1),xaxt='n',yaxt='n') 
all_img = array(dim=c(10,28*28)) 
for(di in 0:9) 
{ 
all_img[di+1,] = apply(train[train[,1]==di,-1],2,sum) 
all_img[di+1,] = all_img[di+1,]/max(all_img[di+1,])*255z = array(all_img[di+1,],dim=c(28,28))
z = z[,28:1] ##right side up
image(1:28,1:28,z,main=di) 
}

This the output that you are going to see -

First Try — Random Forest:

We need to start somewhere to get a baseline so that we can keep a track of our progress. I decided to go with trusty old Random Forest. My Random Forest model got me to somewhere around 96.7% accuracy and somewhere in 1000s on the Leader Board.

library(randomForest)
set.seed(100)
model = randomForest(as.factor(label)~.,data = train,ntree=2000) pred = predict(model,newdata = train) 
importance(model) 
varImpPlot(model) 
table(pred,train$label) test$label = predict(model,newdata = test) 
submission_data = data.frame(ImageId=1:nrow(test),Label=test$label)

Code to plot the test images, you can use to check labels against the image for various models -

par(mfrow=c(4,3),pty='s',mar=c(1,1,1,1),xaxt='n',yaxt='n') 
all_img = array(dim=c(21,28*28))
for(dim in 1:10)
{
all_img[dim,] = apply(test[dim,-785],2,sum)
number<-test[dim,785]
z = array(all_img[dim,],dim=c(28,28))
z = z[,28:1] ##right side up 
image(1:28,1:28,z,main=number)
}

This is just a basic code that will get the work done. I will leave it to you guys to dabble in the realm of Hyper parameter Tuning and trust me it’s a great tool and asset and will give you an insight into the workings of the model.

Once you have the predicted labels for the test dataset, you can plot a few of them to compare against the predicted label. This would give you an idea if your model is giving the right predictions or not.

Here we see that at least for first 10 test images our model is giving the right prediction.

Second Try — Gradient Boosting:

My second model of choice was Gradient Boosting, which in most cases performs similarly to RF. Here again, I am just posting the code to get GBM running, it’s up to you to get Hyper parameters tuned and used. I got slightly lower accuracy as compared to RF, of about 96.1%.

library(gbm) 
set.seed(100) 
model1 = gbm(as.factor(label)~.,data=train,n.trees = 2000, shrinkage = 0.11, distribution = "multinomial", interaction.depth = 7, bag.fraction = 0.9, cv.folds = 10, n.minobsinnode = 50) pred1 = predict(model1,newdata=train,type = "response")
pred1 = apply(pred1, 1, which.max) 
table(pred1,train$label) 
test$label = predict(model1,newdata = test) 
submission_data = data.frame(ImageId=1:nrow(test),Label=test$label)

Here we see that at-least for test images from 11–20 our model is giving the right prediction.

Third Try — Neural Network:

I then tried Neural Network as they are the models on which deep learning models are based. I managed to get an accuracy similar to RF.

library(neuralnet)
m = model.matrix(~.,data =train)
n = names(train) f = as.formula(paste("label ~", paste(n[!n %in% "label"], collapse = " + "))) model2 = neuralnet(f,data =m,hidden = 2, threshold =0.1, stepmax = 1000000,rep = 2, algorithm = "rprop+", err.fct = "sse", linear.output = FALSE) plot(model2) res = neuralnet::compute(model2,as.data.frame(train[,2:785])) 
pred2 = round(res$net.result) table(pred2,unlist(train))

Bonus Tip: Each of these models above would give you similar sort of accuracy on test data. What you can do is taking a number of these and then stack them. The stacked models would give you a little better prediction over the base models. You can try a number of base models such as Adaboost, Treebag, XGboost, etc, besides the above ones.

Final Try — Convolutional Neural Network:

After having tried out hands at the conventional models, we are ready to roll with the big boys. It’s time for some Convolution. Let me know in the comments if you want a detailed article on CNN, how it works and where all it can be used. This was the code that gave me an accuracy on over 99.9% and put me smack in the top 20 of the Leader Board.

library(DiagrammeR) 
library(mxnet) train.x = train[,-1] 
train.y = train[,1] 
train.x = t(train.x/255) 
test = t(test/255)

Now we have the data in the format we need, let’s get building our layers

#CNN - RELU data = mx.symbol.Variable('data') # first conv conv1 = mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20) tanh1 = mx.symbol.Activation(data=conv1, act_type="relu") pool1 = mx.symbol.Pooling(data=tanh1, pool_type="max", kernel = c(2,2), stride=c(2,2))# second conv conv2 = mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50) tanh2 = mx.symbol.Activation(data=conv2, act_type="relu") pool2 = mx.symbol.Pooling(data=tanh2, pool_type="max", kernel = c(2,2), stride=c(2,2))# first fullc flatten = mx.symbol.Flatten(data=pool2) 
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=500) 
tanh3 = mx.symbol.Activation(data=fc1, act_type="relu")# second fullc fc2 = mx.symbol.FullyConnected(data=tanh3, num_hidden=10)# loss lenet = mx.symbol.SoftmaxOutput(data=fc2) 
train.array = train.x 
dim(train.array) = c(28, 28, 1, ncol(train.x))
test.array = test dim(test.array) = c(28, 28, 1, ncol(test))
devices = list(mx.gpu(0)) 
mx.set.seed(0) 
tic = proc.time()

Now we have the layers in places, let’s train our model -

model = mx.model.FeedForward.create(lenet, X=train.array, y=train.y,kvstore = "device", ctx=devices, num.round=20, array.batch.size=100, learning.rate=0.05, wd=0.00001, momentum=0.9, eval.metric=mx.metric.accuracy, epoch.end.callback=mx.callback.log.train.metric(100)) preds = predict(model,train.array) 
pred.label = max.col(t(preds)) - 1 
table(pred.label) 
table(train.y,pred.label) preds1 = predict(model,test.array) 
pred1.label = max.col(t(preds1)) - 1 
table(pred1.label) 
table(test.y,pred1.label)

Where to go from here:

Now we have already got 99.9% accuracy on test data. However, if you see the Kaggle Leader Board there are submissions with 100% accuracy. Now, this can be achieved in two ways:

To do some transformations on the train data, which can slightly rotate the images to left and right. This would help to get more data to train with.
To get more data from the MNIST website and then train on that data.

You can try both these approached and see where you end up on Leader Board for the Handwritten digit recognition challenge on Kaggle.

Hope you all learnt something from this article. Do let us know in the comments if it helped you get started and also if you have any questions. Always happy to help.

Author Profile

Praveen Kumar Singh I am currently working as a Manager in Risk Analytics with Standard Chartered Bank. I have 5 years of experience in Credit Risk Analytics. I love dabbling in Machine Learning and Advanced Analytics. Beyond work I love travelling, playing games on my PC and tinkering around with Arduino.

Originally published at https://www.skilledroots.com on July 7, 2019.