Classification of Handwritten digits using Matlab (CNN)
Digitization of documents has been a prime consensus among developing countries nowadays. This is not only listing the grey economy but also to keep track of infrastructure development in the digital world. This includes transforming available handwritten and text scanning and making an electronic computer-generated version of it.
This article is about using available MNIST data set to train a basic Neural Network model to predict handwritten digits in Matlab. This model can be deployed to create a digitized version of manually written scripts by scanning it.
Dataset:
National Institute of Science and Technology (NIST)`s modified database (MNIST) has been a huge training dataset for digit recognition for more than a decade. This database comprises of 60K training and 10K testing images for machine learning models. The images are 28x28 with the grey channel only. Whereas, our colorful images have dimensions that may be much longer with RGB and multiple channels.
Model Architecture:
Convolution Neural Networks (CNN) are good for multiclass classification because they are sufficiently able to draw a non-linear curve between datapoint. CNN is generally combined with SoftMax, pooling and fully connected layers. This fully connected network tunes its parameters with training and validation of the dataset then tests the performance of the network on classifying the hand-written letters. The main goal is to set the optimal parameters and to compare the importance of those parameters on accuracy and cost function.
Features Extraction:
Feature engineering is most important for any machine learning model to converge. With supervised learning, it becomes easy to keep track of weights learned from channels. The learned filters for conv1 and conv2 are outputs of layers. Network obtains the same sample of the image the activation of the first convolutional layer is not so clear as the one of the second layer. In practice, the activation is a filter output and it can distinguish features of different images. Theoretically, early layers learn slowly then-latest layers. Below are provided the weights learned by both layers.
This can be noticed how both layers focus on curves in zero digit and adjust their weights and biases in backpropagation. Here is a view of weights from the first convolution and the second convolution layer, in this case, is below.
The size of the weight matrix depends on the parameters which we initialized in the network formulation. These matrices are further reshaped to make them more feasible for the visibility of montage. Hidden layers with a greater number of outputs own a larger range of weights and biases so they return matrices of higher dimensions. An increase in the layer will augment training time.
Normalization layer
The main goal of adding the Batch Normalization layer is to normalize the weights and error activation of successive layers, to be somehow equal and in this way to speed up the learning gradient and reduce the sensitivity. In this way, the performance of the network will be better.
The plot will be much steadier, meaning that the value of the accuracy will be bounded and will not vary too much. So, it will be much easier for the network to reach the highest value, better performance. It is more noticeable when we are training the network with a high learning rate, that having an unbounded variance of activations energy will lead to an unstable accuracy plot. We increase the learning rate to 0.15 and below are shown the plots with this layer and without.
Results:
Finally, the Validation accuracy is 99.48% means that the network has provided a correct result for most of the new images and this is a very good performance in general. For some cases in image recognition, this is not acceptable due to the delicate nature of use in sensitive tasks.
Comments on Result.
Ideally, in the first few epochs, the cost is higher, and accuracy is very low because the network does not have the optimal weights and bias. In the beginning, the values of weight and bias are chosen randomly and as possible of correction of the output is low but as we go on with other epochs, we see that the accuracy increases faster and then remains almost steady.
We selected a random image from the test set for classification of handwritten digit and the result gave from the network was the same with the number shown in the image.
Further Intuitions on Filter Sizes:
1. An increase in filter size in convolution layers affects local receptive fields and affects crudely on performance and training time.
2. Increasing the number of fully connected layers will increase the validation accuracy by effecting the training time. It will take more time to train the network because it has many more parameters. Vanishing Gradient is certainly another issue.
3. A minor variation in batch size will not particularly change the accuracy.
Conclusion:
Neural Networks are generally prone to overfitting, to prevent this often some techniques like dropout and regularization are deployed in the algorithm. Neural Networks are universal approximators that work better for forecasting and Image processing problems. If training does not go well model may obtain two abnormalities typically underfitting and overfitting. Underfitting is simply high bias and low variance. Whereas it is more common to have overfitting. These characteristics are observed by learning graphs.
Moreover, artificial neural networks are being used to solve several complex issues and demand is increasing over time. Neural networks handle a large number of applications including image processing, face recognition, and forecasting.
References:
Wikipedia, MNIST database, https://en.wikipedia.org/wiki/MNIST_database
Li Chen et al (2015), 3rd IAPR Asian Conference on Pattern Recognition (ACPR)