Summer Internship Experience- Part 2

Shivam Sahu
3 min readJul 19, 2019

--

In this post, I will be briefly discribing about the architecture and workflow of the model.

Architecture of the initial model

We developed the first model as a Classification model, which would take a image as an input and output it label. The dataset used for developing the model was made by us, so there was only little preprocessing that need to taken care of. The architecture of the model was roughly as shown in figure below :

Graph on Tensorboard

The model performed well on image consisting of single character, but it was of no use on drawing bounding box on image having multiple text. Though this model can be used to classify character, and can be used for small kids in detecting character what they write. Learning with fun. Isn’t it?

Architecture of the final model

The final model uses the architecture of EAST model. EAST model was proposed in 2017 to directly predict words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network.

The pipeline utilizes a fully convolutional network (FCN) model that directly produces word or text-line level predictions, excluding redundant and slow intermediate steps. The produced text predictions, which can be either rotated rectangles or quadrangles, are sent to Non-Maximum Suppression to yield results.

Structure of text detection model

The model does transfer learning by taking the pretrained weights from Resnet-50 model and does further training.

Workflow of the entire project

The ML model is integrated and hosted with the help of Firebase. The image on which classification is to be done is uploaded in bucket.Cloud function is triggered on each upload which in turn calls the predict function. The predict function takes the JSON in required format and give it Google AI Serving. The predicted image is send back to the client.

We tried to read and write the image from Firestore to make it fast, but the problem with it is that the document size in Firestore should not exceed 1 MB. So we switched to Google Cloud Storage for reading and writing the image. The uploaded image and predicted images were saved so that the model can be trained on it to improve the accuracy.

Google AI Serving is very fast, the only limitation it has that the model size need to be 250 MB, and saved model must be in .pb format. The architecture and flow is demostrated by following diagram :

Flow for online prediction

In the next and last part of the series I will be sharing my learning experience with you.

Thanks for reading : Continue Reading :

Part 1 link : here

Part 3 link : here

--

--

Shivam Sahu

Senior Undergraduate Student @ Madan Mohan Malaviya University of Technology