Classification of Signature and Text images using CNN and Deploying the model on Google Cloud ML Engine

As we know for any document of legal importance, may it be a contract, a consignment or a simple form, signature is a substantial part. Signature provides an identification as well as a confirmation. Currently, the model that identifies signature from printed text data is not available. So the work here presented is about classification of signature and text data.

The classification model is built using Keras, a high level API of TensorFlow which is an open-source library for machine learning. This classification model can also help in building the Signature Detection model for the document images.


Data Preparation

The dataset has been generated by extracting the text from the documents and capturing the signature samples from different documents. Data consists of two classes: Signature (class label 0) and Text(class label 1).

Data of text images contains images of independent words with different backgrounds, height, width and stroke thickness. Images of text are not restricted to a unique language but also involves multilingual text. The data contains around 2000 different images.

Data of signature images contains around 1300 different images of signatures with different backgrounds, height, width and stroke thickness.

The data has been stored on Google Cloud Storage. The preliminary step of data cleaning involved dropping of blurred images and realignment of the text with appropriate padding and margins. To increase the size of data, some run time data augmentations like rotations, rescaling and zooming manipulations were performed. Dataset is split into 70% for training and 30% for validation. Apart from this, there is a separate unseen dataset on which the model is tested for accuracy.

The sample of images in dataset

The blog is organised as:

Part I: Independent classification model that can be run on individual system.

Part II: Making the model public by deploying the model on GCP ML-Engine.


I. Classification model

Deep Convolution Neural Network is built by using sequential model. There are three convolution layers along with a fully connected layer followed by an output layer. The CNN parameters like max pooling size is set to (2, 2) and kernel size to (3, 3). Initially the number of filters are set to 32. This has been chosen experimentally. The number of filters doubles in the subsequent convolution layer.

The activation function used is ReLU and the final layer activation function is Sigmoid. A dropout layer is added with dropout probability 0.5. The architecture of the model is as follows:

The summary of model gives the detailed picture of each layer with total number of parameters in each layer. The model summary is as below:

Summary of the model

Next, the model is compiled with evaluation metric as accuracy and loss as binary_crossentropy and optimizer as adam optimizer.

As the training data size is limited, run time image augmentations are added with the help of ImageDataGenerator() function. Image augmentations like rotation, rescaling and zooming are added in the training dataset.

To predict the the output of the model for test dataset, predict method is used. Then the precision, recall and test accuracy are calculated from the predictions using sklearn.metrics.

The final test accuracy after adding image augmentations and dropout layer is 94.29%. The precision for signature images is 96.55% and recall is 97.22%. The table below gives insight into the result upgradations by adding augmentations and dropout layer.

Results of various experiments

II. Training and Deploying the model on Google Cloud ML Engine

Cloud ML Engine helps to train your machine learning models at scale, to host the trained model in the cloud, and to use the model to make predictions about new data.

Data

The data has been prepared by taking the signature images and text images in different languages and with different backgrounds. As mentioned earlier, the same preprocessing is done on the data. There are two classes, signature and text.

Packaging the model

The package architecture of the model ready to be deployed on ML engine is shown below.

Project structure for ML Engine

a) setup.py

The setup.py file contains the dependencies along with versions to be installed for the model to run on cloud ML engine. Cloud ML engine has built-in tensorflow support. All other requirements are needed to be installed.

b) task.py

The task.py file is the entry point of the model. It contains the list of arguments that needs to be parsed while running the model. It also invokes the model and other dependent files if there are any. The trained model is saved in .hdf5 format. The code for task.py file is depicted here:

Note: The saved model is in .hdf5 format. To deploy the model, we need .pb format of the model. For this we need to export the model with TensorFlow serving.

c) model.py

The model.py contains the actual model to be trained. It returns the compiled model to the calling function. The code of model function is shown below.

d) utils.py

This file contains the code for the data preprocessing. The location of the directory containing image files is passed and labelled data is generated that can be fed to model. Data is saved in .npy file, which is then used for model training.

Training the model

a) Training Locally

To train the model on local machine, there are two ways: using python command and using gcloud command.

$ export JOB_DIR=/path/to/job/dir
$ export TRAIN_DIR=/path/to/training/data/dir #either local or GCS
#Train using python
python -m trainer.task \
--train-dir=$TRAIN_DIR \
--job-dir=$JOB_DIR
#Train using gcloud command line tool
$ gcloud ml-engine local train --module-name=trainer.task \
--package-path=trainer/ \
--train-dir=$TRAIN_DIR \
--job-dir=$JOB_DIR

b) Submitting the job to Google Cloud

After the successful training of the model locally, next step is to submit the job to cloud ml-engine. Run the command given below from the directory where your trainer package is located.

$ export BUCKET_NAME="your GCS bucket name"
$ export JOB_NAME="name of your job"
$ export OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_NAME
$ export TRAIN_DATA=/path/to/dataset
#gcloud command line
$ gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir $OUTPUT_PATH \
--runtime-version 1.10 \
--module-name trainer.task \
--package-path trainer/ \
--region $REGION \
-- \
--train-dir $TRAIN_DATA \
--verbosity DEBUG

The logs can be checked from the dashboard of Google cloud ML engine. After the job gets submitted successfully, you can find a folder export in the OUTPUT_PATH of your GCS bucket.

Deploying the model

After training, its time for deploying the model for production. The first step is to convert the saved model from .hdf5 format to .pb (tensorflow model format). The step wise guide along with necessary code and shell commands for it can be found in this notebook.

Step 1 → Creating the model

The gcloud command for creating the model is as below.

$ export MODEL_NAME=<Name of the model>
$ export MODEL_PATH=/gcs/path/to/the/model
#CREATE MODEL
$ gcloud ml-engine models create $MODEL_NAME

Step 2 → Creating version for the model you just created

Run the below command to create version version_1 of the model.

$ gcloud ml-engine versions create "version_1" --model $MODEL_NAME \ --origin $MODEL_PATH \
--python-version 3.5 --runtime-version 1.10

Step 3 → Serving the model for predictions

The prediction request to the model can be sent as test.json. For this you need to convert your image in a .json format request as shown below.

The online prediction can be done with the help of following gcloud command.

$ gcloud ml-engine predict — model $MODEL_NAME — version version_3 — json-instances test_data.json

You can find the code files here. Hope you found this reading helpful and it provided some meaningful insights to you! Your valuable feedbacks are most welcomed. Happy Learning !!!