Hindi Character Recognition with Machine learning and Deep Learning

Suraj Yadav
31 min readOct 9, 2022

--

Image by Author

The widespread application of successful deep learning techniques has helped to solve a large number of pattern recognition issues. The field of artificial intelligence and machine learning has recently experienced a sharp uptick in technological innovation. Self-learning through convolved filters is at the heart of deep learning, which places an emphasis on the autonomous extraction of features. This self-learning requires a significant amount of training data in order to learn from a variety of samples so that the recognition system can deal with differences in character while it is being tested.

Over the course of the past few decades, human beings have become increasingly tech aware, and they now desire computer systems that can read, write, and comprehend papers in their native language. They wish for computers to be able to read documents in the same way as they read digital media. The need for automation was a driving force behind the development of pattern recognition and machine learning as fields of study. Research in the field of image processing and computer vision has seen an uptick in recent years as a result of the increasing trend of digitalization (Optical Character Recognition). The sorting of mail in post offices by address recognition, biometric verification by automatic signature verification, the digitization of forms and studies in academic institutions, and the restoration of historical records are all made easier with the assistance of automatic feature extraction

Business problem:

Introduction

Handwritten Hindi character recognition (HCR) remains largely unsolved despite advancements in object identification technology due to the presence of numerous confusing handwritten characters and extremely cursive Hindi handwritings. Even the most advanced known methods do not produce sufficient HCR performance in reality.

The ability of CNN methods to extract discriminative features from raw data and represent those features in a way that is highly invariant to the distorted state of the object being analyzed is the primary benefit of using CNN methods. The results of the experiments show that CNN models have a better performance than the other popular object recognition approaches, which suggests that CNN could be a good candidate for the construction of an automatic Hindi Character Recognition system for use in practical applications.

Challenges

Recognition of handwritten characters is more difficult than recognition of printed forms of the same character for a number of reasons, including the following:

(1) Handwritten characters written by different authors are not only non-identical and moreover change in various aspects such as size and shape

(2) The multiple differences in writing styles of individual characters make the task of recognition difficult

(3) The similarities of different character in shapes and the interconnections of the neighboring characters complicate the problem of character recognition

(4) The character recognition problem is further complicated by the fact that some characters overlap

Databases play an important part in the development of robust character recognition systems, alongside feature extraction and classification algorithms. The database must contain a wide range of characters and be large enough for the system to pick up new information as it goes through its learning process.

Because deep learning has provided a solution to practically all of the challenges associated with image classification, it is crucial that same methodology be applied to the recognition of Hindi characters. The importance of having a big dataset has been stressed by deep learning approaches, as these algorithms automate the process of feature extraction and learn by observing examples. In order to put deep learning strategies into practice, there is a requirement for extremely large datasets that include a variety of different types of characters.

Applications:

Optical character recognition (OCR) innovation is a business solution for automatically extracting data from written or printed text from a scanned document or image file, and then transforming the content into a machine-readable form so that it can be used for data processing like editing or searching. Object recognition technology can be found in a wide variety of applications, including document scanning and image processing.

Numerous applications exist for Hindi Character Recognition, including the National ID number recognition system, the automatic license plate recognition system for vehicle and parking management system, post office automation, and online banking. Figure 1- 2 depicts some example images of various applications.

(System for the Recognition of Hindi Character from ID card)
(Automatic license plate recognition)

Dataset:

This dataset was obtained from the UCI machine learning repository. It contains approximately 92,000 images of handwritten Hindi characters. The dataset contains 46 character classes, including Hindi alphabets and digits

The dataset is split into a training set comprising 85% of the total dataset and a test set comprising the remaining 15%. The pictures are saved in.png format, and their dimension is 32x32 pixels.

Below you will find some examples of images from the dataset

Image by Author

One character from each class is listed below.

Image by Author

Please click on the following link if you would need further information regarding the dataset:

https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset

  • Data Size : 53.4 MB
  • The file structure has been set up in a manner similar to how you may format a document while working with images

To be more specific:

A train directory that includes all of the images that are part of the training dataset, along with subdirectories that are named after different classes and contain images that belong to those classes.

A test directory with a structure identical to that of the train directory.

An example of a file structure

DevanagariHandwrittenCharacterDataset <- top level folder
└───train <- training images
│ └───character_1_ka
│ │ │ 1340.png
│ │ │ 1341.png
│ │ │ …
│ └───character_2_kha
│ | │ 2771.png
│ | │ 2772.png
│ | │ …
| └───character_3_ga
| | 3710.png
| | 3712.png
| | …
| .
| .
| .

└───test <- testing images
│ └───character_1_ka
│ │ │ 1339.png
│ │ │ 1342.png
│ │ │ …
│ └───character_2_kha
│ | │ 2786.png
│ | │ 2787.png
│ | │ …
| └───character_3_ga
| | 3711.png
| | 3713.png
| | …
| .
| .
| .
  • There are 46 different sub-directories in the train directory. Each sub-directory has 1700 images for each character.
  • There are 46 different sub-directories in the test directory, and each sub-directory has 300 images for each character.
  • We have 78,200 images (1700 × 46) that can be used for training the model.
  • We have 13,800 images (300 × 46) that we can use to test the model
  • Shape of the image is 32 × 32 pixel.
Image by Author
Image by Author
Image by Author

Loss function:

Loss function is a method for assessing “how well your algorithm represents your dataset.” If your predictions are completely inaccurate, your loss function will return a larger number. If they are decent, it will output a smaller number. As you tune your algorithm to attempt to improve your model, your loss function will indicate whether or not you are making progress.

When we are faced with a task that requires the classification of multiple classes, one of the loss functions that we are able to use is the Categorical Cross Entropy (CCE). If we are going to use the Categorical Cross Entropy loss function, then the number of output nodes has to be the same as the number of classes.

Key metric (KPI) to optimize:

A metric is a function that is utilized in the process of evaluating the effectiveness of a model. It is a parameter that our model relies on to determine how well it operates. This problem can be classified as a multi-class problem because the task at hand is to construct a model that can recognize Hindi characters from the images that are provided.

  • Accuracy
  • Confusion Matrix
  • Precision,
  • Recall
  • and F1 score

Accuracy

The simplest intuitive performance metric is accuracy, which is just the ratio of correctly predicted observations to the total number of observations. Accuracy tells us right away if a model is being trained correctly and gives us a general idea of how it will do. But it doesn’t say much about how it can be used to solve the problem.

If our model can achieve a high level of accuracy, then we have the best one. Accuracy is a wonderful measure, but we can only use it effectively if our datasets are symmetric and in our case we have symmetric dataset, for every character there are 1700 images

When the data are not evenly distributed, accuracy is not the most useful measure.

Confusion Matrix

The Confusion matrix is one of the most straightforward and simple metrics for determining the correctness and accuracy of a model. It is applied for the Classification problem in which the output may have two or more classes. There are 46 unique classes in our dataset.

The number of correct and incorrect predictions are easily visualized through a confusion matrix. It shows how the model is confused by the predictions and the actual values.

Precision

It is the number of relevant instances out of all the instances that were retrieved. When the cost of a false positive is high and the cost of a false negative is low, recall is a better metric to use.

Recall

It is the proportion of actual positive classes that were correctly predicted out of the total number of actual positive classes. Also referred to as ‘Sensitivity’.

It is more advantageous to employ recall as an evaluation metric in situations in which the cost of a false negative is high and the cost of a false positive is low.

Requirements:

Hardware requirements

  • Computer system
  • 8,16 GB Ram
  • Intel i3 and above processor
  • HDD/SSD with at least 256GB of storage

Software requirements

  • VS Code or any other code editor
  • Python 3
  • Python libraries like NumPy,Pandas,Matplotlib, etc.
  • Anacoda Navigator , Jupyter Notebook
  • Streamlit
  • Scikit-learn,tensorflow
  • The client OS can be either Windows or Linux.
  • Web Browser

Get subset data:

One of the most important procedures that has to be taken whenever we are working on a project that involves data science is to download some data and then load it into the memory so that we can process it.

When we do that, we run the risk of encountering a few problems, one of which is the presence of an excessive amount of data that requires processing. If the amount of data that we have is greater than the amount of memory that we have available (RAM), we may run into some issues when attempting to complete the project.

The second potential challenge for us is the amount of time required for training; the greater the dataset, the longer it would take to train the model.

If we have to test a number of various models using a large dataset, then it will take a significant amount of time for us to determine which model is the most suitable for the data that has been provided.

What we can do is take a small subset of the total amount of data that was originally collected and train your various models with this subset of the data, after which you will select the top models that perform the best with the subset of the data, and then you will train those top models with the original dataset.

So, the dataset that we have contains 92,000 images, of which 78,200 are for training the model (1700 × 46), and 13,800 are for testing the model (300 × 46). Based on the information presented in the previous section (3. Dataset), we are aware that there are 46 different classes (characters), and that in the training phase for each class there are 1700 images. What we will do is take 10% of the images from each class, which equals 170 images total, to construct our more manageable dataset for training.

For the purpose of validating the model, we will use all of the images, which total 13,800 images; there are 300 images in each class.

The code that describes how we constructed this subset of data is given below.

Link for the 10 % data

  • G-Drive link :
  • GitHub link :

https://github.com/Suraj124/Hindi-Character-Recognition/blob/main/Dataset/DevanagariHandwrittenCharacterDataset_10_percent_train_100_percent_test.zi

This project will be divided into three sections, and the first of those parts will be dedicated to training and testing several machine learning models using traditional machine learning techniques. In the second part, we will train and test several distinct state-of-the-art deep learning models using the deep learning model. Then, in the third part, we will deploy the final model.

Classical Machine Learning Models:

Because the dataset that we have is made up entirely of images, we are unable to use the images themselves in traditional machine learning models. Instead, we need to convert each image into vector format. Each image has a resolution of 32 by 32 pixels, so when we convert them into vector format, we will obtain a 1024-dimensional vector for each image.

Image by Author

A function that I have created will be used to perform the transformation from pixel to vector format on the images. This method receives the path to the train images and the test images, as well as the names of all classes, and it returns a DataFrame that has all of the rows as image vectors and the last column as the target or class name.

You’ll find that function farther down

If we look at the training images in the DataFrame, they will look like this:

Image by Author

Hyper parameter tuning:

A Machine Learning model is a mathematical model that has a number of parameters that must be learnt from data. We may fit the model parameters by training a model using existing data.

However, there is a different form of parameter that is known as hyperparameters, and hyperparameters are not something that can be directly acquired by the standard training procedure. In most cases, they are fixed before the beginning of the actual training process. These parameters convey essential aspects of the model, such as its degree of difficulty or the optimal rate at which it should run.

The following are some instances of hyperparameters for models:

  • The penalty that is applied in the Logistic Regression Classifier, often known as L1 or L2 regularisation
  • Support vector machines’ C and sigma hyperparameter.
  • The “k” in k-Nearest Neighbors.
  • The learning rate for training a neural network.

There are many different automatic optimization techniques available, and each one has its own set of benefits and downsides when applied to a particular category of problems.

The following is a list of some of them:

  • Scikit-learn
  • Optuna
  • Hyperopt.
  • Ray-Tune
  • BayesianOptimization

Here, we will employ Optuna for hyper parameter tuning.

The process of optimising these hyperparameters can be fully automated with the help of the Optuna. It does this by employing a variety of samplers, including grid search, random, bayesian, and evolutionary algorithms, so that it can determine the ideal values for the hyperparameters automatically.

The following characteristics of optuna convinced me to put it to use for hyperparameter tuning in order to address the issues that I was attempting to resolve.

  • Eager dynamic search spaces
  • Efficient sampling and pruning algorithms
  • Easy integration
  • Good visualizations
  • Distributed optimization

Data Splitting:

Every machine learning challenge starts and ends with data. Without the right data, machine learning models are basically like bodies that are missing their souls. However, in this day and age of “big data,” the collection of data is no longer a significant challenge. Every day, we are either intentionally or unintentionally producing massive amounts of data. Having an abundance of data on hand, however, does not in and of itself address the problem. Not only do we need to feed ML models vast quantities of data, but we also need to make sure the data we feed them is of a high enough quality for them to produce accurate results.

Even though making sense of raw data is an art in and of itself and requires good feature engineering skills and domain knowledge (in certain cases), quality data is of no use until it is properly utilized. The primary challenge that practitioners of ML and DL face is determining how to partition the data for use in training and testing. Only by delving deeply into the issue can one determine how complicated it truly is, despite the fact that at first glance it appears to be a straightforward issue. Inadequate training and testing sets can result in effects that are unpredictable on the output of the model. It is possible that this will cause the data to be overfit or underfit, and as a result, our model could end up producing biased results.

Since the dataset that we are given is already separated into train and test formats, we do not need to execute any data splitting because it is done for us.

One of the things that we may do in this situation is separate the feature columns and the target columns for the training and testing data.

Image by Author

Feature Scaling:

When it comes to machine learning, scaling the features that are being used is one of the most important tasks that must be completed during the pre-processing of data before building a machine learning model. This is done in order to prepare the data for machine learning. When it comes to judging whether a machine learning model is effective or ineffective, scaling may be the component that becomes decisive.

The most common methods for scaling features are Normalization and Standardization.

Normalization is used when we want to make sure that our values are between [0, 1] or [-1, 1]. Standardization changes the data so that the mean is 0 and the standard deviation is 1.

Why is it necessary for us to scale?

The machine learning algorithm only sees the number, and if there is a significant difference in the range, for example, few ranging in thousands and few ranging in the tens, it will make the fundamental assumption that greater ranging numbers have some kind of advantage. Consequently, these more significant numbers begin to play a role that is more crucial when the model is being trained.

One more argument for applying feature scaling is that certain algorithms, such as the gradient descent of neural networks, converge far more quickly with feature scaling than they do without it.

Normalization on data:

Standardization of data:

Predictive Modelling:

The primary objective of this project is to determine, given an image of a certain Hindi character, which character are present in the given image. As a result, this is an example of a supervised classification problem, which needs to be trained using techniques such as:

  • K-Nearest Neighbors
  • Support Vector Classifier
  • Random Forest
  • XGBoost

We will train and evaluate the models using the above four algorithms, and then we will select the best two algorithms based on how well they work. In addition, train those two algorithms using the complete dataset.

K-Nearest Neighbors

The k-nearest neighbors (KNN) algorithm is a method for categorizing data points according to the likelihood that a given data point will belong to the same group as the data points that are located in the immediate vicinity of that data point.

Because it does not presume anything about the distribution of the data being analyzed, this technique is referred to as a non-parametric method. Simply put, KNN examines the data points in the immediate vicinity of a given data point in order to decide which group that data point belongs to.

I have written a function with the assistance of which it is possible for me to directly obtain the accuracy, f1-score, precision, and recall all at once for the true label and the predicted label.

KNN used with Optuna to determine the optimal hyperparameters:

We achieve an accuracy of 79% using KNN with hyperparameter tuning, and for the other three measures, we achieve approximately the same level of performance.

Support Vector Classifier (SVC)

One of the most common uses for the popular Supervised Learning algorithm known as the Support Vector Machine is to solve problems of Classification and Regression. Though its primary application is in Machine Learning Classification problems.

The purpose of the Support Vector Machine (SVM) technique is to generate the best line or decision boundary that can partition an n-dimensional space into classes. This will allow us to conveniently place any new data points in the appropriate category in the future. This optimal decision-making boundary is referred to as a hyperplane.

SVM selects the extreme points/vectors that contribute to the formation of the hyperplane. These extreme examples are referred to as support vectors, and the corresponding technique is known as the Support Vector Machine. Consider the diagram below, which depicts the classification of two distinct categories using a decision boundary or hyperplane:

SVC used with Optuna to determine the optimal hyperparameters:

Using SVC with hyperparameter tuning, we are able to reach an accuracy of 86%, and for the other three metrics, we achieve nearly the same level of performance.

Random Forest

The Random Forest technique is a powerful one for machine learning that can be applied to a wide variety of problems, including regression and classification. It is an ensemble method, which means that a random forest model is made up of a large number of smaller decision trees, which are referred to as estimators. Each estimator generates their very own set of predictions. A more precise prediction can be obtained by the use of the random forest model, which incorporates the results of various estimators.

The common decision tree classifiers have the drawback of being prone to overfitting to the training set, which can lead to inaccurate results. The ensemble design of the random forest enables it to generalize effectively to data it has not seen before, including data with missing values. This helps the random forest to adjust for the unknown data and perform well. Random forests are also effective at dealing with huge datasets that have a high dimensionality.

RandomForest is applied in combination with Optuna to obtain the ideal hyperparameters.

We are able to get an accuracy of 81% by utilizing RandomForest with hyperparameter tuning, and for the other three metrics, we achieve approximately the same level of performance as well.

XGBoost

Recent applied machine learning and Kaggle contests for structured or tabular data have been dominated by XGBoost. XGBoost is a high-performance gradient-boosted decision tree implementation.

XGBoost is an extension of gradient boosted decision trees (GBM), a technique that was developed specifically to improve both speed and performance.

In order to find the optimal values for the hyperparameters, XGBoost is implemented in combination with Optuna.

By implementing XGBoost with hyperparameter tuning, we are able to obtain an accuracy of 78%, and for the other three metrics, we get around the same level of performance as well.

Picking the Two Most Outstanding Models:

In the following, you will find a dataframe that I have prepared. It has all four metrics that belong to the models that we have seen previously.

The dataframe is sorted so that the f1-score decreases from highest to lowest in order to select the top two models with the highest f1-score.

As can be seen from the information presented above, Model-2 (SVM) and Model-3 (Random Forest) are the best models in terms of having the highest accuracy, f1-score, precision, and recall value.

Let’s use these two models in conjunction with the entire dataset. Previously, we have used only 10 percent of the data set, and the results that we are getting are kind of good. Let’s use the entire dataset and see what happens.

Support Vector Classifier (SVC)

Using SVC with the complete dataset, we were able to achieve a 96% accuracy rate, which is a very respectable accuracy rate.

Random Forest

We were able to reach a 92% accuracy rate using RandomForest on the entire dataset, which is a pretty acceptable accuracy rate.

Therefore, in conclusion, with the assistance of a classical machine learning algorithm, we were able to obtain an accuracy of 96%; the algorithm that allowed us to accomplish this accuracy is the Support Vector Classifier.

In the following section, we will explore a number of different approaches to Deep learning to determine whether or not we can improve upon this accuracy

Deep Learning Models:

Deep learning has seen a rapid rise in popularity in the field of scientific computing, and its algorithms are now being utilized extensively by industries that deal with difficult problems. To solve a variety of problems, each and every deep learning algorithm employs a unique neural network of some kind.

Deep learning employs artificial neural networks to perform complex computations on massive data sets. It is a form of machine learning that mimics the structure and operation of the human brain.

A subfield of machine learning known as convolutional neural networks, also abbreviated as CNNs or convnets. It is one of the many different kinds of artificial neural networks, which are used for a wide variety of applications and can process many different kinds of data. Image recognition and other tasks that require the processing of pixel data are ideal applications for convolutional neural networks (CNNs), which are a type of network architecture for deep learning algorithms.

https://editor.analyticsvidhya.com/uploads/25366Convolutional_Neural_Network_to_identify_the_image_of_a_bird.png

There are multiple kinds of neural networks used in deep learning; however, convolutional neural networks (CNNs) are the network architecture of choice when it comes to the identification and recognition of objects. Because of this, they are excellent candidates for computer vision (CV) tasks and applications in which object recognition is an essential component, such as self-driving cars and facial recognition.

And since CNN architecture is the most appropriate choice for our Hindi Character Recognition project, we will experiment with various CNN-based algorithms in order to train the models.

The following are some of the deep learning algorithms we will be focusing on.

  • LeNet
  • LeNet + BatchNormalization + Dropout
  • VGG
  • VGG + BatchNormalization + Dropout
  • ResNet
  • ResNet + BatchNormalization Dropout
  • Transfer Learning : VGG16
  • Transfer Learning : ResNet

Instead of only using 10% of the data, as was the case with traditional Machine Learning, we will use the entire dataset for all of our deep learning models.

Complete dataset link:

GitHub link for dataset:

https://github.com/Suraj124/Hindi-Character-Recognition/blob/main/Dataset/DevanagariHandwrittenCharacterDataset.zip

Entire dataset consisting of 98,200 images; 78,200 of those images are assigned to the training category, and 13,800 of those images are assigned to the testing category.

Since we do not have any data for validation, we will divide the data used for training into a validation set and a training set.

In order to accomplish this partitioning of the data into train and validation sets, I created a function, which is provided here.

The following will be implemented as a part of our plan to improve the overall performance of our Deep Learning models.

· Caching

· Prefetching

  • Caching:

The tf.data.Dataset.cache transformation can cache a dataset, either in memory or on local storage. This will save some operations (like file opening and data reading) from being executed during each epoch.

  • Prefetching:

We need the CPU to prepare the next batch of data while the GPU is busy with forward/backward propagation on the current batch. As the most expensive component, we need to ensure that the GPU is always working at peak capacity throughout the training process. We refer to this phenomenon as consumer / producer overlap, with the GPU serving as the consumer and the CPU as the producer.

You can accomplish this with tf.data by making a straightforward call to the dataset method. prefetch(1) is performed at the very last stage of the pipeline (after batching). This will always ensure that one batch of data has been prefetched and that there is always one ready to be used.

The code to implement the caching and prefetching for both the training dataset and the testing dataset is provided below.

Callbacks:

Tensorflow callbacks can be thought of as either functions or chunks of code that are put into action at a predetermined point in time during the process of training a Deep Learning Model.

The process of training a Deep Learning model is something that all of us are very familiar with. Because the models are becoming increasingly complicated and resource-intensive, the amount of time required for training has also significantly increased. Therefore, it is not unusual for models to require a significant amount of training time. Before beginning the actual process of training the model, the standard procedure calls for us to first lock down all of the available options and parameters, such as the learning rate, optimizers, and losses, before beginning the actual training process.

Once the training procedure has begun, there is no way to interrupt it in order to modify some parameters. Also, when the model has been trained for several hours and we wish to modify some parameters at a later point, this is sometimes impossible. TensorFlow callbacks come to the rescue in this situation.

How to use Callbacks?

1. To begin, we will define the callbacks.

2. Pass the callbacks when calling the model.fit()

Let us have a look at some of the callbacks that are going to prove to be the most beneficial to us during the course of our training.

TensorBoard: This is my favorite tensorflow callback by far. This callback writes a log for TensorBoard, an amazing visualization tool provided by TensorFlow

ModelCheckpoint: We use this callback in order to save our Model at different epochs. This gives us the ability to store weights at intermediate steps, giving us the flexibility to load weights at a later time if necessary.

EarlyStopping: For those who work in machine learning, overfitting is a nightmare. Putting an early stop to the procedure is one strategy you may use to avoid overfitting. The EarlyStopping method contains different metrics/arguments that you can adjust to set up when the training process should end.

ReduceLROnPlateau: When a certain statistic has reached a plateau and is no longer improving, this callback lowers the learning rate. This callback tracks a quantity and slows down learning if no improvement is shown after a specified amount of epochs, or “patience.”

Deep Learning Models:

Let’s go through the different models of deep learning one by one.

Model: 1 → LeNet

The LeNet-5 CNN architecture consists of seven layers. Three convolutional layers, two subsampling layers, and two fully — connected layers comprise the layer composition.

We are able to get an accuracy of 97.8% using LeNet.

I have written a function that, with its assistance, enables us to plot the accuracy of both training and validation as well as training loss and validation loss for each epoch.

Let’s have a look at the plot for loss and accuracy for LeNet.

Let’s also take a look at the f1-scores of all of the different classes for LeNet

Image by Author

Model: 2 → LeNet + BatchNormalization + Dropout

BatchNormalization:

Normalization is generally applied to the input data that is fed to the input layer of a neural network. But, normalization is not limited to the input layer. We can also apply normalization to the outputs of hidden layers in a neural network. These normalized outputs will become the inputs for the next hidden layer. So, the hidden layers also benefit from normalization. This is called batch normalization.

Advantages of implementing batch normalization:

1. After normalization, the values fall within a narrow range. Therefore, the values are now modest and uniform (have the same range). This accelerates the learning (or convergence) of neural networks.

2. Reduce the problem of vanishing gradients

3. As the values of the inputs are passed through multiple hidden layers, they are subjected to a process in which they are multiplied by weights (scaled), and shifted by biases. Consequently, the distribution of the inputs to each layer will change throughout the training process. The covariate shift problem refers to the modification that has occurred in the layer’s input distribution. By adjusting the input values of hidden layers, batch normalization provides an efficient solution to this problem.

Dropout:

To implement dropout regularization, a small percentage of the network’s nodes are removed at random while it is being trained. During the parameter update procedure, the eliminated nodes do not take part.

The probability value that we choose to use when doing the dropout regularization will determine the number of nodes that must be eliminated from the network. For instance, if we give each node in a layer a probability value of 0.4, then there is a 40% chance that each node will be deleted during the training process at each iteration of the process.

How dropout regularization prevents neural network overfitting?

  • By randomly eliminating nodes from the network, parameters are updated using a smaller network. When the size of the network decreases, it provides less flexibility, which reduces overfitting.
  • By removing nodes from the network at random, the weights for those nodes will also stop working (zero). So, other weights need to be a part of the process of learning. When this happens at each iteration, the weights are spread out much more, and instead of updating some weights too much, they are all updated in the right way. The output of the network does not depend on a few big weights. This can stop neural networks from overfitting.

With LeNet, BatchNormalization, and Dropout, we get an accuracy of 98.3%.

Let’s have a look at the plot for loss and accuracy for LeNet.

Model: 3 →VGG

The VGG architecture serves as a significant source of inspiration for Model 3.

We are able to get an accuracy of 98.7% using VGG like architecture.

Let’s have a look at the plot for loss and accuracy for VGG.

Model: 4 → VGG + BatchNormalization + Dropout

We achieve an accuracy of 99.28% with model 4, which is somewhat higher than what we achieve with model 3, which is 98.75%.

Let’s have a look at the plot for loss and accuracy.

Model: 5 → ResNet

We are able to get an accuracy of 99.28% with model 5, which is almost the same as what we were able to achieve with model 4 (VGG+BatchNormalization+Dropout).

Let’s have a look at the plot for loss and accuracy.

Model: 6 → ResNet + BatchNormalization Dropout

Model-6 achieves an accuracy of 99.55%, which is the highest accuracy we have obtained with any of the other models we have developed, this is an excellent model.

Let’s have a look at the plot for loss and accuracy.

Model: 7

We have achieved an accuracy of 99.07% with the model 7

Let’s have a look at the plot for loss and accuracy.

Model: 8

With the help of model 8, we able to achieve an accuracy of 98.95%, which is not poor.

Let’s have a look at the plot for loss and accuracy.

Model: 9 →Transfer Learning: VGG16

Model 9 has an accuracy of 93.16%, which is lower than any previous model we build.

It would appear that transfer learning was not fruitful.

Let’s have a look at the plot for loss and accuracy

Model: 10 →Transfer Learning — ResNet part-1

In this ResNet Transfer Learning, only the last 20 layers are trainable.

81.79% accuracy for this model is the lowest of any model that we obtain

Model: 11 →Transfer Learning — ResNet part-2

In this ResNet Transfer Learning, all the layers are tranable.

Model 11 provides us with an accuracy of 82.10%, which is a minor step up from model 10's accuracy.

Let’s have a look at the plot for loss and accuracy

Pick Best Performing Model:

The top three best performing models, as seen in the preceding DataFrame, are as follows:

Selection of model:

Model 6-ResNet with BatchNormalization and dropout will be used for deployment. This model has a training accuracy of 99.95, a validation accuracy of 99.40, and a test accuracy of 99.55.

Future Work:

This entire effort is done only for one character recognition, thus if we want to recognize a single word, which is nothing more than a sequence of characters, our model will not function.

Model Deployment:

Deployment is the project’s final phase, in which we deploy the entire deep learning model into a production system, in a real-time scenario.

The model will be used in a situation where real-world images will be used as input, and it will predict the Hindi character in each image.

App Building:

This step involved the creation of a web application using Streamlit. It provides the client with the ability to upload the input image.

I’m going to make a Streamlit app as a component of the client interface. An open source Python platform called Streamlit is used to create web applications for machine learning and data science. With Streamlit, you can write code in an app just like you would in Python. The script will be kept in.py format.

Image by Author

With the following command, we can execute our web app in local mode with streamlit.

We are developing a web app with the title Hindi Character Recognition, with menus Home, Prediction, Get test images, and code which given below.

Home Page:

Prediction Page:

Get test images page:

Code Page

Deploying machine learning models using Streamlit Cloud:

Your application’s accessibility will increase if it is released online. The app can be used to access content after deployment on any computer or mobile device worldwide.

You can freely share your Streamlit web app over the internet with the help of Streamlit’s Streamlit Share feature. Your apps may be deployed with just one click thanks to Streamlit Cloud.

  1. Upload your app to GitHub:

The app’s code and dependencies must be on GitHub before you attempt to deploy it since Streamlit Cloud launches apps directly from your repository. Add a Python dependencies requirements file to your project.

2. For deployment, follow the procedures below:

2.1) Go to https://share.streamlit.io/

2.2) Sign in using GitHub.

2.3) In the upper right corner of your workspace, select “New app.”

2.4) After hitting the New button, you will be sent to the Deploy your app page. Enter your repository, branch, and file location, then click “Deploy.”

2.5) Your app is now being deployed, and you may wait for it to launch.

2.6) After deployment, you will notice a URL, and that is the URL for your application.

https://suraj124-hindi-character-recognition-app-r1dhcr.streamlitapp.com/

When you click on the above link, you will be taken to my project Hindi Character Recognition.

Reference:

· [1]. M. Yadav and R. Kumar Purwar, “Synthetic Data for Hindi Character Recognition,” Yadav Purwar Int. J. Emerg. Technol., vol. 11, no. 2, pp. 793–796, 2020, Accessed: Jun. 14, 2022. [Online]. Available: www.researchtrend.net

· [2]. M. Z. Alom, P. Sidike, M. Hasan, T. M. Taha, and V. K. Asari, “Handwritten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional Neural Networks,” Comput. Intell. Neurosci., vol. 2018, 2018, doi: 10.1155/2018/6747098.

· [3] “Off-line Handwritten Character Recognition Using Features Extracted from Binarization Technique.” https://cyberleninka.org/article/n/1213064/viewer (accessed Jun. 22, 2022).

· [4]. B. M. J, “COMPARISON OF CLASSIFIERS FOR GUJARATI NUMERAL RECOGNITION,” vol. 3, no. 3, pp. 160–163, 2011, Accessed: Jun. 22, 2022. [Online]. Available: http://www.bioinfo.in/contents.php?id=31

Blogs:-

GitHub Link :- https://github.com/Suraj124/Hindi-Character-Recognition

Project Link :- https://suraj124-hindi-character-recognition-app-r1dhcr.streamlitapp.com/

Thank you for taking the time to read my blog ❤

Special thanks to Applied Roots Team (appliedaicourse.com) for their assistance in solving these Case studies.

--

--