Digit Recognizer

Mukesh Chaudhary
4 min readFeb 18, 2020

--

Deep Learning ( Convolutional Neural Network (CNN))

I am trying to built model of convolution Neural Network (CNN) for recognizing hand writing digit number. I will try to explain as much as more points which is used during make the model because it helps to understand how is working actually in convolution neural network (CNN) internally . Let’s start from basic . Deep learning is subset of machine learning in Artificial intelligence(AI). It is also called deep neural network. There are several types of neural network algorithm. Convolutional neural networks (CNN) is one of them . Let’s see first CNN basic architecture before talk more .

We can see above picture CNN algorithm takes image as input and it does feature engineering with convolution, pooling and at last it feed into classification fully connected layers nodes .CNN is specially used for image recognize and video . Let’s see another picture .

Above picture shows input image , two convolution layer with kernel layer (filter ) , two max-pooling , neural network with ReLU activation and output with softmax activation. Before go further steps , i explain title bit some important points.

What is input ?

Input is image which is three layers of RGB matrix form . image could be any pixel like ( 4*4 ) with 3 RGB layers . It is 3D matrix like ( 4*4*3). let’s see picture :

What is Convolutional Layer and Pooling ?

Image Dimensions = 5(Height) * 5(Breadth) * 1(Number of channels e.g RGB).

Below figure , green part is input image . Convolution layer is called Kernel/filter which is represented yellow part . Kernel shifts 9 times because of stride length =1 where every time matrix multiplication operation happen.

Similar to convolution layer, the pooling layer is responsible for reducing the spatial size of the convolution feature by using Max Pooling or Average Pooling .

let’s see title bit mathematical formula which is actual core concept of the algorithm. lost function is

loss(𝑦̂ ,𝑦)=−(𝑦log(𝑦̂ )+(1−𝑦)log(1−𝑦̂ ))

where 𝑦̂ is prediction label on X_test features and y is y label of test data .

𝑦̂ =𝜎(𝑤 * 𝑥+𝑏)

𝑤:=𝑤− 𝛼 * 𝑑𝐽(𝑤)/𝑑𝑤 and 𝑏:=𝑏−𝛼 * 𝑑𝐽(𝑏)/𝑑𝑏

where w is weight , b is constant and 𝛼 is learning rate . It is updated on every iterated points(backpropagation)

Rectified Linear Unit (ReLU)

Rectified linear activation(ReLU) function overcomes the vanishing gradient problem, allowing models to learn faster and perform better. It is very useful on back-propagation .

We have another traditional popular activation function

  • sigmoid activation function
  • hyperbolic tangent function

Let’s go coding section.

Import all necessary libraries for CNN algorithm

Data is used here from kaggle.com . Raw data shape is (42000, 785) . It is splitted into train and test data . After that , it is reshaped into (-1,28,28,1). first -1 of matrix shape indicates whole observation(rows) and (28*28*1) indicates pixel and number of layers of RGB. At last , data is finalized for modeling by scaling where each pixel divided by 255 ( color combination range 0 -255 or 2⁸ possibility)

X_train data ‘s plot and image

Sequential model is built with ReLU activation and output layer is built with softmax activation.

fig : Sequential CNN model

Model compiles with loss ( ategorical_crossentropy ) , optimizer (Adam(lr=1e-4)) and metrics ( accuracy). we could use SGD(Stochastic Gradient Descent) on optimizer but it slow on computation running time . After that model fit with epochs 20 , batch_size 28 ( which is pixel size of image) ,learningrate .

fig: fit model

Loss and Accuracy of train and test data

Graph of loss and accuracy with epoch.

loss and accuracy graph with epochs

Evaluation :

Heatmap of confuse matrix of prediction label and test label

Conclusion :

Here i just tried to explain basic model of convolution neural network algorithm for digit recognizer with input image of 28 *28 pixel and 1 layer of RGB. I think it helps to understand for beginner phase .

--

--