How to get top 2% position on Kaggle’s MNIST — Digit Recognizer

3 min readApr 19, 2018

Applying CNN

This tutorial shows the use of a convolutional neural network model that was built and trained with Keras on top of Tensorflow.

This first part I’ll focus on Machine Learning model, parameters and results. The second part I’ll explain how to deploy this model as an API.

I trained the model on the MNIST dataset provided by Kaggle to produce good results in recognize handwritten digits.

“The MNIST database is a large database of handwritten digits.”

For those that are new to Tensorflow and Keras, I would recommend to start there trying some tutorials and play around with the code and then come back here.
For those that are entirely new to the subject of deep learning, I would recommend you to check out fast.ai, Cognitive Class AI, Siraj or Sentdex.

Before we start: Here are some hints on how I set up my workstation for this tutorial. Please make sure you have all of that running on your computer before you go on with this tutorial. The installation process might take some time so be sure you do not cross the plans of your significant other.

Prerequisites for this tutorial:

Tensorflow (version 1.1.0) — I'm using Tensorflow GPU
Python 3.6
Keras 2 (version 2.15)
Pandas (version 0.22)
Numpy (version 1.12)
Scikit-learn (version 0.19)
Matplotlib (version 2.2.2)

Alright if you made it to this point, some stuff will be familiar to you which is really good!

The Convolutional Neural Network Model

A convolutional neural network can have tens or hundreds of layers that each learns to detect different features of an image. Filters are applied to each training image at different resolutions, and the output of each convolved image is used as the input to the next layer.

In our case, each image data point is a representation of 28 pixels by 28 pixels image, for a total of 784 pixels. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

After some tests and researches I've reached this model: