Hand-written Digit Recognition Using CNN Classification(Process Explanation)

Published in

Analytics Vidhya

13 min readAug 22, 2020

“Artificial Intelligence is a branch of science which offers with helping machines discover solutions to complex problems in a greater human-like-fashion” (Sridhar, 2020). In different words, Minsky defines AI as “the science of making machines do things that would require Genius if finished by using men”.
AI is a concerted effort to apprehend the complexity of human ride in the procedure of information. This addresses now not only how to scientifically represent and use complex and incomplete data however additionally how to see (Vision), pass (robotics), conversation (natural language, speech) and apprehend (memory, thought, classification).

Problem Domain

Hand-written digit cognizance is the capacity of a PC system to apprehend hand-written inputs such as digits, characters, etc. from a extensive variety of sources such as emails, documents, images, letters, etc. This has been a problem of lookup for decades. Some areas of lookup consist of verification of signatures, processing of bank checks, interpretation of postal addresses from envelopes and many extra are turn out to be less difficult and extra handy through digit recognition methods. Several classification methods using Machine Learning have been developed and used for this purpose, such as K-Nearest Neighbors, SVM Classifier, Random Forest Classifier, etc., but these methods, whilst having the accuracy of 97%, are not adequate for real-world purposes.
In current years, the research community has been gaining significant interest in deep learning-based strategies to remedy a range of supervised, unsupervised and reinforced getting to know problems. One of the most regularly occurring and broadly used strategies is Convolution neural networks (CNN's), a kind of neural networks which can extract relevant features robotic-ally from enter information. Here, we will learn about the implementation of well-known MNIST facts set to predict and recognize handwritten digits the use of deep gaining knowledge of techniques and Machine Learning algorithms. This is not a new topic and the MNIST information set is nonetheless very common and essential to take a look at and affirm new algorithms after quite a few decades. The project requires a lot of libraries such as primary ML libraries, deep mastering libraries, EDA (Exploratory Data Analysis) and tensor-flow the place tensor-flow is used as back-end with keras at some stage in the development process.

Background

Generally Handwriting Character Recognition (HCR) is categorized into six phases which are acquisition of image, pre-processing of enter image, segmentation, feature extraction, classification and put up processing .

Block Diagram of Handwriting Character Recognition

A. Image Acquisition
The input photo is supplied to the consciousness gadget at the Image Acquisition stage. The input can be either in an photograph layout such as JPEG, BMT, etc., or a scanned image, digital camera, or any other gorgeous digital input machine or can be taken from the canvas on the person interface.
B. Pre-Processing
The 2nd method, known as pre-processing, is the entry approach for personality cognizance and is very essential in finding out the focus quality. Preprocessing operates to normalize strokes and also to take away deviations that can minimize the accuracy rate. Preprocessing works usually on distinctive distortions such as irregular textual content size, missing points at some point of pen movement, jitters, left — proper bend and uneven spaces.
C. Segmentation
Segmentation is used to transform the enter representation of many characters to the individual characters. The methods used are the segmentation of words, strains and characters. Typically, it is carried out by way of isolating a single persona from a word picture. In addition, the contents are processed in a way that is like a tree. In the initial scenario, the line histogram is used to phase the lines. After that, each level, the characters are retrieved by a approach known as histogram, and subsequently they are retrieved.
D. Feature Extraction
The aim of the extraction characteristic is to permit the extraction of the sample that is most important for classification. Some of the Extraction Function techniques such as Principle Component Analysis (PCA), Scale Invariant Feature Extraction (SIFT), Linear Discriminant Analysis (LDA), Histogram, Chain Code (CC), Zoning and Gradient-based applied sciences can also be used to get rid of the traits of character characters. All of these functions are used to educate the system. Each of the segmented pictures is taken with a pixel of dimension 28* 28 .
By flattening the array into a vector of 28 * 28 = 784 numbers, the photograph now converges to a minimal bunch of arrays in a quite high-quality structure 784-cell dimension. The photograph now turns into a n dimensional array tensor.
E. Classification
Decision-making takes vicinity at some point of the classification process. The extracted attributes are used to become aware of the characters. Different classifiers algorithms are used, such as SVM and Neural Networks. The classifiers sort the precise input function with reserved sample and find the best matching input classification for which Soft Max Regression is being used. Soft Max regression assigns each result with the probability so classification will become easy. This essentially incorporates all the proof this obtains through using components and then transforms it into the conceivable chances .
F. Post-Processing
The Post-processing is the last and ultimate phase of persona recognition. It is the procedure whereby herbal language is used to right the misclassified output. It procedures output by means of getting it after a recognition of the shape. If the shape is diagnosed basically then the accuracy can be increased in accordance to language knowledge. For exclusive handwriting inputs, shape recognizers behave differently.
Since 1998, researchers have been learning the hassle of handwritten digit focus with almost all the algorithms developed with the aid of then and even up till now. The rate of test blunders reduced from 12% in 1988 per linear classifier to 0.23% in 2012 through Convolutionary networks, and extra and extra facts scientists and computer studying experts are attempting to increase and validate unsupervised getting to know methods such as auto encoders and deep getting to know models.
The MNIST database consists of 70000 handwritten digitized numerals dispensed in ten different classes. For training purposes, the entire dataset is divided into 60,000 images, and the ultimate ten thousand is reserved for the check collection. In this work in the interval[0,1], the gray level values of every pixel are coded the use of a price of 0 for white pixels and 1 for black pixel.
In the MNIST dataset, the records is already properly prepared: the pics have been founded in a 28x28 picture by computing the core of the pixel mass and translating the picture to role this point at the core of the 28x28 field. The coaching set consists of 30,000 patterns from SD-3 and 30,000 patterns from SD-1 and the take a look at set consisted of 5,000 SD-3 patterns, and 5,000 SD-1 patterns

Methodology(Using CNN Classifier)

Handwritten digit identification has come to be a subject of hobby among researchers. A giant variety of papers and articles about this situation are being posted in these days. It is proven in lookup that Deep Learning algorithms such as multilayer CNN the use of Keras with Tensorflow grant the absolute best accuracy compared to the most normally used machine getting to know algorithms such as SVM, KNN & RFC (Siddique et al., 2019). Because of its very best accuracy, Convolutional Neural Network (CNN) is being used for this project.
One of the earliest rankings was posted with the aid of (Lecun et al., 1998) themselves, which includes references up to 2012 (Baldominos et al., 2019). “It can be effortlessly validated that the early computer getting to know techniques used by way of (Lecun et al., 1998) protected linear classifiers (whose error rate stages from 7.6 to 12%), K-nearest neighbors procedures (K-NN, ranging from 1.1 to 5%), non-linear classifiers (about 3.5%), help vector machines (SVM, from 0.8 to 1.4%), neural networks (NN, from 1.6 to 4.7%) and convolutional neural networks (CNN, from 0.7 to 1.7%)”. It is notable that the additional facts results, are higher carried out through a Convolutionary neural network besides distortion or preprocessing in (Lecun et al., 1998) used to be 0.95% (Baldominos et al., 2019).
CNNs are beneficial mannequin category for both supervised and unsupervised gaining knowledge of paradigms, the place K-NN is used for unsupervised learning, and SVM is used for supervised learning. The supervised getting to know mechanism is the one the place the system enter and outputs (true labels) are identified and a mapping between the two is discovered. In the unsupervised gaining knowledge of mechanism, the proper labels for a given set of inputs are no longer known, and the mannequin ambitions to estimate the underlying distribution of data sample inputs (Khan et al., 2018).
The points of Convolutionary neural networks, together with the capability to extract information from multidimensional sources, provide them with a very useful option for solving laptop vision problems. Since the assignment is about extracting the elements from an photo and is associated to computer imaginative and prescient problems; hence, convolutional neural community is used (Baldominos et al., 2019)
To recognize the handwritten digits, a seven-layered convolutional neural community with one input layer accompanied by hidden layers and two output layer is designed and illustrated below.

Algorithm

CNN object classification mannequin takes, analyzed and classifies an enter photo which in our case is digits under a positive category. Deep learning allows CNN fashions to be skilled and tested. photograph is transmitted via a collection of filters (kernels), pooling and Fully connected convolution layers and makes use of Softmax to classify an object with probabilistic values between zero and 1.

A CNN consists of a lot of layers. These layers when used repeatedly, lead to a formation of a Deep Neural Network. The fundamental types of layers used to build a CNN are:
1. Input
This layer holds the uncooked pixel values of photograph and convert it to grayscale pics using 28x28 matrix of pixels.
2. Convolutional Layer
This layer gets the effects of the neuron layer that is linked to the enter regions. The wide variety of filters to be used in this layer is described here. Each filter may additionally be a 5x5 window that slider over the input records and receives the pixel with the most intensity as the output.
3. Rectified Linear Unit (ReLU) Layer
This layer applies an thing smart activation function on the picture records and makes use of again propagation techniques. ReLU function is utilized in order to preserve the equal values of the pixels and not being changed by means of the returned propagation.
4. Pooling Layer
Down-sampling operation along the spatial dimensions (width, height), resulting in volume is utilized in this layer.
5. Fully Connected Layer
This layer is used to compute the score instructions that potential which class has the maximum score corresponding to the enter digits. The category label with the largest likelihood is chosen as the ultimate classification from the network and proven in the output.

Implementation

A. Tensorflow
Tensorflow is used as backend in the application of this project. TensorFlow is an brilliant records circulation in the Machine Learning Library made by means of the Google Brain Team and made open supply in 2015. It is designed to ease the use and greatly relevant to each numeric and neural gadget troubles simply like different spaces. TensorFlow is essentially a low-level math-entangled tool that pursuits experts who apprehend what they’re doing to construct exploratory studying structures, play around with them, and turn them into running programs. For the most part, it can be considered as a programming context in which equations can be entitled as graphs. Math things to do are spoken by using nodes in the graph, and the edges include the multidimensional facts clusters (tensors) linked to them.
B. Python
Python is used for the duration of the implementation of assignment the place several traces of code had been brought in order to accomplish the assignment requirements. Python is typically used globally, and is a high-level programming language. It was once implemented in particular for application dominance, and its language shape allows software program engineers to carry thoughts in fewer traces of code. Python is a programming language which gives the chance to work shortly and more correctly organize frameworks.
C. Anaconda
Anaconda is used as IDE all through the implementation of the project. Anaconda is a free and open-source appropriation of the Python and R programming for logical figuring such as statistics science, AI applications, instruction of large-scale information, prescient investigation, etc. Anaconda accompanies over 1,400 programs just like the Conda package and digital surroundings director, Anaconda Navigator, so it takes the want to determine out how each library can be freely added.
D. Keras
Keras is used to build model to arrange the layers in the course of the implementation of this project. Keras is a high-level neural community API written in Python that can run on top of TensorFlow, CNTK, or Theano. It used to be developed with a focal point on allowing for quickly experimentation. The key to doing true lookup is being capable to go from notion to result with the least delay viable.
Keras approves for handy and speedy prototyping (through person friendliness, modularity, and extensibility). Similarly, it supports each convolutional networks and recurrent networks, as properly as combinations of the two and runs seamlessly on CPU and GPU.
E. NumPy
NumPy is used for mathematical calculations to print out the predict records in this project. NumPy is the core bundle with Python for scientific computing. It is a versatile sophisticated (broadcasting) with N-dimensional array object characteristic software program for combining C / C++ and Fortran code, advantageous linear algebra, Fourier transform, and random number capabilities.
F. Matplotlib
Matplotlib is used to plot model accuracy and loss in a graphical view in this project. Matplotlib is a Python 2D plotting library that produces pleasant figures for the publication throughout platforms in a variety of hardcopy formats and interactive environments. Matplotlib can be used in Python scripts, Python and IPython shells, Jupyter notebook, Web software servers, and four interface toolkits for graphical users.

Achieved Results

At first, required libraries and packages are imported during the implementation of project.

Then, object of the MNIST dataset is created and later loaded through the help of Tensorflow backend using Keras. After that, total number of train along with their dimension and test data set is printed out in order to view total number of elements present in the dataset.

After the dataset is loaded, one of the images from the training dataset is loaded and displayed in gray scale format by using matplotlib library.

Then the training and testing data sets are normalized where image data values are converted in terms of 0 and 1.

After normalizing the data, a CNN model is created using keras library. Then the Flatten layer is added into the model. Then after, input and hidden layer followed by output layers are built using CNN algorithm.

After building the model successfully, model is compiled using Adam optimization algorithm where this algorithm is used for training Deep Neural Networks (DNN).

After compiling the model, model is fit to train data train and training is started using train data set, due to which cross entropy loss and accuracy of the model can be achieved while training the data from dataset.

After the completion of training of the data set, performance of the model is evaluated using test data set, as well as accuracy and loss of data set is achieved.

After evaluation of model, prediction of the model is made using test data set.

Then, predicted data is printed and displayed using the index of the array.

After the test data of the model is predicted, all available test data set (Image) in the trained model is reshaped to width and height of 28 respectively.

After the image is re shaped, one of the images followed with its label from the test data set is displayed using matplotlib library.

Then, graph is plotted to measure accuracy of the built model using train and test dataset.

Then, graph is also plotted to measure loss of the built model using train and test dataset.

After that, image is plotted in gray scale view followed with their predicted label, true label and accuracy. Color is also defined in the class to segregate the expected result outcome.

Since, the data set is too huge, it will be difficult to display the whole data set images. So that, number of rows and columns is assigned to display the limited images from the data sets.

Conclusion

The project of the undertaking is just to create a model which can recognize the digits using MNIST datasets however it can be prolonged to letters and then a person’s handwriting. It can be used by countless organization, schools, banks and even for family activities.
Handwritten digit focus will be beneficial for government bodies or any different organization to identify citizenship identification range which helps in automation. Likewise, license card quantity of any individual can be diagnosed thru this system. Similarly, it can be used for academic reason the place student can learn and recognize the real world solution making use of this system. Similarly, postal addresses, bank cheque digit consciousness can be made less complicated thru automation the usage of this system.