Hindi Character Recognition on Android using TensorFlow Lite

Published in

Nerd For Tech

8 min readSep 15, 2022

If you ever wanted to build an image classifier for text recognition, I’m assuming you probably must have implemented the classic Handwritten Digit Recognition application from TensorFlow’s official examples.

Often referred to as the ‘Hello World’ of Computer Vision, it’s a great starting point for a beginner in ML to build a classifier application. Wouldn’t it be great to build your own custom classifier that can recognize any character(s)? As of today, we’ll build a Hindi character recognizer, but feel free to pick a dataset of your own choice, or simply follow me. Sounds exciting, right?

We are going to build a Machine Learning model that will be able to recognize Hindi characters, and that too, right from scratch. Not only we will be building an ML model, but also deploying it on an Android mobile application. This article will therefore serve as an end-to-end tutorial that will cover almost everything you need to build and deploy an ML application.

I will try to explain everything as simply and clearly as possible. So, are you excited? I’m very much.

Data Preparation

To train a Machine Learning model that should yield good results, we need lots of data. You must have heard about the MNIST Digit database, right? Let’s recall.

MNIST, as it stands for “Modified National Institute of Standards and Technology”, is a popular database for handwritten digit recognition that has over 60,000 images for the digits 0–9. Now, it is important to understand the looks and format of the MNIST database as we will be synthesizing a dataset for Hindi characters that is “MNIST-like”.

Every digit in the MNIST dataset is a 28 x 28 binary image in white color with a black background.

Okay, now that we have the idea, let’s synthesize our own dataset for Hindi characters. I have the dataset already saved in my GitHub repository. Feel free to clone the repository and download the dataset.

The dataset contains all the Hindi vowels, consonants, and numerals. These images have to be converted into NumPy arrays (.npz), in order to be fed for model training. The script below will help you with the conversion.

Import dependencies
We start with importing the required dependencies and libraries

import tensorflow as tf
from tensorflow import keras
from PIL import Image
import os

import numpy as np
import matplotlib.pyplot as plt
import random

!pip install -q kaggle
!pip install -q kaggle-cli
print(tf.__version__)os.environ['KAGGLE_USERNAME'] = "<your_kaggle_username>"
os.environ['KAGGLE_KEY'] = "<your_kaggle_key>"
!kaggle datasets download -d nstiwari/hindi-character-recognition --unzip

Convert JPG images into NPZ (NumPy array) format
Now, the training images need to be converted into the .npz format for them to be passed as input.

# Converts all the images inside HindiCharacterRecognition/raw_images/10 into NPZ format.
path_to_files = "/content/HindiCharacterRecognition/raw_images/10/"
vectorized_images = []

for _, file in enumerate(os.listdir(path_to_files)):
   image = Image.open(path_to_files + file)
   image_array = np.array(image)
   vectorized_images.append(image_array)
np.savez("./10.npz", DataX=vectorized_images)

Load the training images NumPy array
The training images are vectorized into the NumPy arrays. In other words, the pixels of all the training images are vectorized between the values [0, 255] into a single ‘.npz’ file.

path = "./HindiCharacterRecognition/vectorized_images/numeral_images.npz"
with np.load(path) as data:
    #load DataX as train_data
    train_images = data['DataX']

Load the training labels NumPy array
Similarly, the labels of the respective training images are also vectorized and bundled into a single ‘.npz’ file. Unlike the images array, the labels array contains discrete values from 0 to n-1, where n = no. of classes.

path = "./HindiCharacterRecognition/vectorized_labels/numeral_labels.npz"
with np.load(path) as data:
    #load DataX as train_data
    train_labels = data['DataX']

In this example, I’m training the model for 5 classes — ३, अ, क, प, and न. The dataset covers all the vowels, consonants, and numerals, so feel free to choose any class of your choice.

NO_OF_CLASSES = 5  # Change the no. of classes according to your custom dataset

Normalize the input images
Here, we normalize the input images by dividing every pixel by 255 so that each pixel holds a value between [0, 1].

Pixels having the value 0 are completely dark (black) while their counterparts with the value 1 are white. Any value between 0 and 1 is grey with its intensity depending upon which end is the closest.

Colour Scale between 0 and 1

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
print('Pixels are normalized.')

Inspect the shape of the image and label arrays

The shape of the images array should be (X, 28, 28) where X = no. of images.
The shape of the labels array should be (X, ).

Note: The no. of images and no. of labels must be equal, of course.

train_images.shape
train_labels.shape

Visualize the training data

# Show the first 50 images in the training dataset.
j = 0
plt.figure(figsize = (10, 10))
for i in range(550, 600): # Try playing with difference ranges in interval of 50. Example: range(250, 300)
   j = j + 1
   plt.subplot(10, 5, j)
   plt.xticks([])
   plt.yticks([])
   plt.grid(False)
   plt.imshow(train_images[i], cmap = plt.cm.gray)
   plt.xlabel(train_labels[i])
plt.show()

Phew, that was some work. Finally, we are done with the first step. Our dataset looks perfect now and is ready to be trained.

Model Training

Okay, so far so good. The main game begins now. Let’s start with the model training.

In the cell below, we define the layers of the model and set the hyperparameters such as the optimizer, loss function, metrics to quantify the model performance, no. of classes, and epochs.

# Define the model architecture.
model = keras.Sequential([
   keras.layers.InputLayer(input_shape=(28, 28)),
   keras.layers.Reshape(target_shape = (28, 28, 1)),
   keras.layers.Conv2D(filters=32, kernel_size = (3, 3), activation = tf.nn.relu),
   keras.layers.Conv2D(filters=64, kernel_size = (3, 3), activation = tf.nn.relu),
   keras.layers.MaxPooling2D(pool_size = (2, 2)),
   keras.layers.Dropout(0.25),
   keras.layers.Flatten(),
   keras.layers.Dense(NO_OF_CLASSES)
])

# Define how to train the model
model.compile(optimizer = 'adam',
              loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True),
              metrics = ['accuracy'])

# Train the digit classification model
model.fit(train_images, train_labels, epochs = 50)
model.summary()

It took me around 30–45 minutes to train the model for 5 classes, with each class having approximately 200 images. The time to train the model will vary depending upon the no. of classes and image/per class you choose for your use case. While the model is training, go and have some coffee.

Quantization

We are halfway through this blog. The Keras model (.h5) is ready. However, for us to be able to use this model on a mobile application, we will need to quantize the model and convert it into the TF Lite format, a lighter version of the original TF model.

Quantization allows a respectable trade-off between the accuracy and size of the model. With a tiny decrease in the accuracy, the model size can be decreased drastically, thereby making its deployment easier.

Convert the TF model into the TF Lite model
Conversion of the Keras model into the TF Lite format requires several steps. First, the model is simply converted into TF Lite without performing quantization.

# Convert Keras model to TF Lite format.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_float_model = converter.convert()

with open('model.tflite', 'wb') as f:
   f.write(tflite_float_model)

# Show model size in KBs.
float_model_size = len(tflite_float_model) / 1024
print('Float model size = %dKBs.' % float_model_size)

Now, we re-convert the model to TF Lite using quantization. This drastically reduces the model size.

# Re-convert the model to TF Lite using quantization.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quantized_model = converter.convert()

# Show model size in KBs.
quantized_model_size = len(tflite_quantized_model) / 1024
print('Quantized model size = %dKBs,' % quantized_model_size)
print('which is about %d%% of the float model size.'\ % (quantized_model_size * 100 / float_model_size))

The final TF Lite model is ready. Now, we need to export the model for us to be able to use it.

# Save the quantized model to file to the Downloads directory
f = open('mnist.tflite', "wb")
f.write(tflite_quantized_model)
f.close()

# Download the digit classification model
from google.colab import files
files.download('mnist.tflite')

print('`mnist.tflite` has been downloaded')

Almost done. We now have the TF Lite model ready to be deployed on an Android app. You can find the entire Colab notebook here.

Deploy Model

I customized the original Digit Classifier application developed by TensorFlow to improve its looks and feels. In Step 1, you might have cloned the repository. In that, you should find the Android_App directory.

Copy the mnist.tflite model file inside the Hindi-Character-Recognition-on-Android-using-TensorFlow-Lite/Android_App/app/src/main/assets directory.

Next, open the project in Android Studio and let it build itself for some time. Once the project is built, open the DigitClassifier.kt file, and edit Line 333 by replacing <your_no_of_output_classes> with the no. of output classes in your model.

Again, in the DigitClassifier.kt file, edit Line 118 through Line 132 by setting the label names according to your custom dataset.

Finally, build the project again and install it on your Android mobile and enjoy your own custom-built Hindi character recognition app.

So, that’s a wrap for this blog. To quickly summarize:

We started with data preparation to synthesize an MNIST-like dataset for Hindi characters consisting of vowels, consonants, and numerals; vectorized the images and labels for feeding into the neural network.
Next, we architected the model by adding the Keras layers, then configured the hyperparameters and started the model training.
After the TF model was trained, we quantized and converted it into TF Lite format to make it ready to deploy.
Finally, we built an Android app (although not from scratch as it was beyond the scope of this blog) and deployed our classifier model on it.

I hope you liked the blog as much as I enjoyed writing it. If you would like to talk more about this, feel free to connect with me on LinkedIn. Stay tuned for more interesting topics on Machine Learning where I will be covering end-to-end examples.

Resources

Hand-written Digit Recognition: Tutorial by TensorFlow on training a hand-written digit recognizer.
Digit Classifier: Official example for digit classification on Android by TensorFlow.