Cloud Vision API — OCR with Notebook on GCP

Thursy Satriani
Google Cloud Platform by Cloud Ace
4 min readJan 30, 2020

--

Documentation

Cloud Vision API is a Google Cloud service includes the capability to do Optical Character Recognition (OCR). This tutorial will show how to use Vision API on a GCP Notebook. First thing first, create a project on GCP. Sign up here for free and create a project if you don’t have any. Then navigate to this link, for further information related to Vision API. You can also try this feature on the web provided by Google here.

Here are the steps to use Vision API on a Notebook:
1. Enable the Vision API
2. Generate an API key
3. Upload the image into Google Cloud Storage
4. Create Notebook Instances, and
5. Use text detection (OCR methods) of the Vision API

Enable Vision API

To enable Vision API, Navigate to APIs & Services → Library. Type “Vision” on the search bar and select Vision API and click enable button.

Click Enable on Vision API

Generate the API key

Navigate to APIs & Services → Credentials. Click Create credentials, in the drop down menu select API key.

APIs & Services → Credentials
Generate the API key

Save the API key somewhere before using it on Notebook.

Upload the Image into Google Cloud Storage

Select any image with text in it and upload it to the storage. For this case I will use the image below.

Target image

Create a bucket by navigating to StorageCreate Bucket. Give your bucket a globally unique name and click Create.

Creating the bucket with globally unique name

To upload the file into Google Cloud Storage, click upload files and select the target image.

Click Upload Files

After uploading the image to the Bucket, click three dots at the end of image files. Select Edit Permission. Add the last permission allUsers as Reader. Click save.

Create Notebook Instances

Select Notebooks from AI Platform. Click New Instance and select TensorFlow 2.x without GPU. Wait for a minute then click on open Jupyterlab to open the Notebook environment.

Notebook instances with preinstalled TensorFlow

In the Notebook environment, select Python 3 Notebook. Then we can start to write the code.

Use text detection (OCR methods) of the Vision API

First, declare your API key by replacing <YOUR_API_KEY> on code below:

APIKEY = "<YOUR_API_KEY>"

Press shift + enter to run the code. Because we are calling the APIs from Python (clients in many other languages are available), Install the Python package (as it is not installed by default on Notebook).

!pip install --upgrade pip
!pip install --upgrade google-api-python-client

Create a request and generate the result using TEXT_DETECTION. Don’t forget to replace the image URI on code below. To know the URI, click on target file uploaded into storage.

from googleapiclient.discovery import build
import base64
IMAGE="gs://<project-name>/<folder>/<file-name>.jpg"
vservice = build('vision', 'v1', developerKey=APIKEY)
request = vservice.images().annotate(body={
'requests': [{
'image': {
'source': {
'gcs_image_uri': IMAGE
}
},
'features': [{
'type': 'TEXT_DETECTION',
'maxResults': 3,
}]
}],
})
responses = request.execute(num_retries=3)
print(responses['responses'][0]['textAnnotations'][0]['description'])
The output

The text will be extracted from the image. These mentioned steps are also applicable to using the Vision API on Google Colab.

--

--