Tips & tricks for using Google Vision API for text detection.

What is Vision API?

The Google Cloud Vision API enables developers to create vision based machine learning applications based on object detection, OCR, etc. without having any actual background in machine learning.

Making a Vision API call.

A code snippet below makes a DOCUMENT_TEXT_DETECTION request using python API library.

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "en",
          "description": "Wake up human!\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 29,
                "y": 394
              },
              {
                "x": 570,
                "y": 394
              },
              {
                "x": 570,
                "y": 466
              },
              {
                "x": 29,
                "y": 466
              }
            ]
          }
        }, ......
.....
  • You can use this type of representation to divide the text content and process them separately as you want.
  • Suppose you are extracting information from text documents with fixed format, using this representation will make your work easier.

Plotting bounding boxes from the response.

The code below uses the response to draw bounding boxes around the feature we specify, in this case it is a word.

Image on left is the original image and the image on right contains the plotted bounding boxes for each detected word.

Finding location of a word

Suppose in the above image, you want to extract the amount of Loans, including overdrafts, for that you need the location of that keyword, in this case, it is ‘Overdrafts’. You can use this below code to search for the location of that word, and based on the coordinates of the word, you can then extract that particular amount.

Finding word inside a given bounding coordinates.

Now you have the coordinates of the word ‘Overdrafts’. To extract the amount, assume a box starting from the right of the word ‘Overdrafts’ having same width as of overdrafts. We need to extract text inside this box. You can use the below code to do that.

Some Practical Use Cases of Vision API’s OCR capability.

  • Extracting data from user forms or identification documents.
  • Extracting text from scanned images containing text.
  • Scanning user passbooks, and many such use cases.

Links

Searce Engineering

We identify better ways of doing things!

Monark Unadkat

Written by

Searce Engineering

We identify better ways of doing things!