Hindi Character Recognition

Mallugari santosh reddy
5 min readFeb 10, 2022

--

Table of Contents

Introduction

Machine Learning Formulation

Constraints

Dataset Overview

Performance Metrics

Exploratory Data Analysis

Modeling

Results

Model Deployment

Future Work

References

Introduction

Character recognition is a process that allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that computers can use.

As a part of this case study, we are going to recognize “Hindi characters”.

Machine Learning Formulation

It is a Character Recognition problem related to computer vision, where our task is to predict the Hindi character present in the image.

Constraints

The Model should predict or recognize the character present in the image in real-time. So the latency of the model should be low.

Dataset Overview

The Dataset consists of Images belonging to 46 different Hindi Characters.

The dataset consists of 78,200 Train images and 13,800 Test images belonging to 46 classes.

Performance Metrics

Since this is a classification problem we will be using ‘Accuracy’ as our metric for this problem.

Exploratory Data Analysis (EDA)

First, let's check the few images from the dataset.

The image represents the Training Dataset

Now let's check if the Train dataset is balanced or not.

The image represents the count of images for each class

From the above bar plot, we could clearly understand that the ‘Train dataset’ is well balanced.

The images in the dataset have large dimensionality we will try to reduce the dimensionality and visualize the data points using T-SNE.

Since the dataset is large and as T-SNE takes a lot of time to reduce the dimensionality, we will be using only random 25,000 images to visualize.

A 2-Dimensional representation of T-SNE

The image represents 2-Dimensional T-SNE

A 3-Dimensional representation of T-SNE

The image represents 3-Dimensional T-SNE

From the above plots, we can clearly observe that all the images are mixed together without forming any groups of similar classes.

Modeling

In this section, we are going to build a “Convolutional Neural Network Architecture” using convolutional layers, Max-Pooling layers, and Dense Layers

What is a Convolutional Neural Network?

Convolutional Neural Network is also known as ConvNet, is a deep neural network that is applied to various computer vision problems especially for image problems.

The building block of a ConvNet is the Convolutional Layer.

What does Convolutional Layer do?

The convolutional layer computes the convolutional operation of the input images using kernel filters to extract fundamental features from an image

The most common type of convolution that is used is the 2D convolution layer which is usually abbreviated as Conv2D. A filter or a kernel in a conv2D layer “slides” over the 2D input data, performing an elementwise multiplication. As a result, it will be summing up the results into a single output pixel. The kernel will perform the same operation for every location it slides over, transforming a 2D matrix of features into a different 2D matrix of feature

The image represents the functionality of the Convolutional Layer

What does Max-Pooling Layers do?

Pooling is a feature commonly used in Convolutional Neural Network architectures. The main idea behind a pooling layer is to “accumulate” features from maps generated by convolving a filter over an image. Formally, its function is to progressively reduce the spatial size of the representation to reduce the number of parameters and computation in the network.

There are many types of poolings in which Max-Pooling is one of the types and the functionality of a Max-Pooling Layer is to take the maximum pixel value from a sub-matrix of defined ‘pool_size’.

In the below image the pool_size is 2x2.

The image represents the functionality of the Max-Pooling Layer

What is a Dense Layer?

The dense Layer is a simple layer of neurons in which each neuron receives input from all the neurons of the previous layer. Dense Layer is used to classify images based on output taken from convolutional layers.

What is Dropout?

Dropout is a technique used to prevent a model from overfitting. Dropout works by randomly setting the outgoing edges of hidden units (i.e, neurons that make up hidden layers) to 0 at each update of the training phase

The Image representing the Dropout technique

Model Architecture:

The image represents the architecture of the model

Results

After training the model, the Minimum loss I could achieve was 0.126 and the best accuracy was 96.2% using the above Model Architecture

The image represents “Epoch vs Loss”
The image represents “Epoch vs Accuracy”
The image represents the Confusion Matrix

From the above Confusion Matrix, we can observe that our model predictions are very good.

Here are the few model predictions for the given images

The images representing the ‘Actual’ & ‘Predicted’ classes

Model Deployment

Finally, with the help of “Streamlit Framework”, I created an interactive web app and used “AWS EC2 Instance” for deploying the model.

Click Here for the Deployment App

Video representing “Hindi Character Recognition Predictions”

Future Work

In the future, we can try to extend our Character recognition model to Word recognitional model and recognize Hindi words instead of a single Hindi character

References

https://arxiv.org/pdf/1409.1556.pdf

https://ruder.io/transfer-learning/

https://www.tensorflow.org/tutorials/images/data_augmentation

https://www.appliedaicourse.com/

Here is my LinkedIn Profile & please feel free to contact me.

--

--