Deep Dive in Extracting Text from the Images using the Google Cloud Vision API

Roya
13 min readAug 23, 2023

--

This is the second lab from the series of deep dive into Computer Vision Fundamentals with Google Cloud Labs. The first lab can be found here.

Photo by Kina on Unsplash

Task 1. Visualize the flow of data

The following steps must be executed to Extract Text from the Images using the Google Cloud Vision API.

  1. An image that contains text in any language is uploaded to Cloud Storage. This is done in Task 5.
  2. A Cloud Function is triggered, which uses the Vision API to extract the text and detect the source language. (Cloud Function ocr-extract, Python Function process_image)
  3. The text is queued for translation by publishing a message to a Pub/Sub topic. A translation is queued for each target language different from the source language. (Python function process_image)
  4. If a target language matches the source language, the translation queue is skipped, and text is sent to the result queue, another Pub/Sub topic. ( Python function detect_text)
  5. A Cloud Function uses the Translation API to translate the text in the translation queue. The translated result is sent to the result queue. (Cloud Function ocr-translate, Python function translate_text)
  6. Another Cloud Function saves the translated text from the result queue to Cloud Storage. (Cloud Function ocr-save, Python function save_result)
  7. The results are found in Cloud Storage as txt files for each translation.

Task 2. Prepare the application

In this stage, the user must create a bucket. In the previous lab, the bucket was generated through Google Storage. In this lab, it will be created through the Google Cloud Platform (GCP) Command Line Interface (CLI).

gsutil mb gs://YOUR_IMAGE_BUCKET_NAME

The gsutil mb gs://YOUR_IMAGE_BUCKET_NAME command in Google Cloud Shell creates a new Cloud Storage bucket named YOUR_IMAGE_BUCKET_NAME. The bucket will be created in the default project for the current user.

The mb (make bucket) is a command used to create new Cloud Storage buckets. The gs:// prefix specifies that the Cloud Storage service will create the bucket. The YOUR_IMAGE_BUCKET_NAME parameter sets the name of the bucket to create. The same command with a different bucket name generates a bucket for the results.

Next is to create a Pub/Sub topic.

gcloud pubsub topics create YOUR_TRANSLATE_TOPIC_NAME

The gcloud pubsub topics create YOUR_TRANSLATE_TOPIC_NAME command in Google Cloud Shell creates a new Pub/Sub topic named YOUR_TRANSLATE_TOPIC_NAME. The topic will be created in the default project for the current user.

The pubsub command is used to manage Pub/Sub topics and subscriptions. The topics subcommand is used to manage topics. The create command is used to create new topics. The YOUR_TRANSLATE_TOPIC_NAME parameter specifies the name of the topic to create. The same command with a different topic name is used to create the topic for the result.

Once the topic is created, developer can publish messages to it using the gcloud pubsub topics publish command. For example, the following command will publish the message Hello, world! to the topic my-translate-topic:

gcloud pubsub topics publish my-translate-topic --message="Hello, world!"

To finish this task, repository from GitHub is cloned.

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

The git clone command is used to create a copy of a remote repository on the local machine. The remote repository can be hosted on GitHub, GitLab, or any other Git server. Here https://github.com/GoogleCloudPlatform/python-docs-samples.git is the repository to clone from.

To change the directory to the right directory, we use the following commands.

cd python-docs-samples/functions/ocr/app/

The cd command in the terminal is used to change the current directory.

Task 3. Understand the code, and Task 4. Deploy the functions

As suggested in this task, a deeper look into the code is necessary. For simplicity and clarification, the tasks planned to be done in task four are also explained here.

The code is started by importing the dependencies.

import base64
import json
import os

The import base64 statement in Python imports the base64 module. The base64 module provides functions for encoding and decoding binary data into ASCII strings. This is useful for transferring binary data over networks or storing it in files.

Two of the widely used functions in this module are:

  • b64encode(): Encodes a binary string into an ASCII string.
  • b64decode(): Decodes an ASCII string into a binary string.

The import json statement in Python imports the JSON module. The JSON module provides functions for encoding and decoding JSON data. JSON is a lightweight data-interchange format that is easy for humans to read and write. It is also easy for machines to parse and generate.

Some of the must-known functions in this module are:

  • dumps(): Encodes a Python object into a JSON string.
  • loads(): Decodes a JSON string into a Python object.
  • dump(): Writes a JSON string to a file.
  • load(): Reads a JSON string from a file.

The import os statement in Python imports the os module. The os module provides functions for interacting with the operating system.

Some of the note-worthy functions in this module are :

  • getcwd(): Returns the current working directory.
  • listdir(): Lists the contents of a directory.
  • mkdir(): Creates a new directory.
  • rmdir(): Deletes an empty directory.
  • system(): Runs a system command.
from google.cloud import pubsub_v1
from google.cloud import storage
from google.cloud import translate_v2 as translate
from google.cloud import vision

The from google.cloud import pubsub_v1 statement in Python imports the Pub/Sub v1 module from the Google Cloud client library for Python. The Pub/Sub v1 module provides classes and functions for working with the Pub/Sub API.

The Pub/Sub API is a fully-managed real-time messaging service that allows users to send and receive messages between independent applications. It is a reliable, scalable, and cost-effective way to decouple microservices and distributed systems.

The from google.cloud import storage statement in Python imports the Cloud Storage module from the Google Cloud client library for Python.

The Cloud Storage API is a fully-managed object storage service that allows users to store and retrieve data objects. It is a highly scalable and durable service well-suited for storing large amounts of data.

The from google.cloud import translate_v2 as translate statement in Python imports the Translate v2 module from the Google Cloud client library for Python.

The Google Cloud Translate API allows users to translate text between languages. It is a powerful tool for internationalizing your applications and making the content accessible to a global audience.

The from google.cloud import vision statement in Python imports the Vision module from the Google Cloud client library for Python.

The Google Cloud Vision API allows users to extract information from images, such as text, faces, and objects. It is a powerful tool for understanding and analyzing images.

vision_client = vision.ImageAnnotatorClient()

The vision_client = vision.ImageAnnotatorClient() statement in Python creates a client object for the Google Cloud Vision API. The ImageAnnotatorClient class is used to interact with the Vision API.

The ImageAnnotatorClient class has methods for detecting text, faces, and objects in images. It also has methods for extracting metadata from images, such as the image's file name, size, and format.

translate_client = translate.Client()

The translate_client = translate.Client() statement in Python creates a client object for the Google Cloud Translate API. The Client() class is used to interact with the Translate API.

The Client() class has methods for translating text from one language to another. It also has methods for detecting the language of the text.

publisher = pubsub_v1.PublisherClient()

The publisher = pubsub_v1.PublisherClient() statement in Python creates a client object for the Google Cloud Pub/Sub API. The PublisherClient() class is used to publish messages to Pub/Sub topics.

The PublisherClient() class has methods for creating topics, publishing messages, and managing subscriptions.

storage_client = storage.Client()

The storage_client = storage.Client() statement in Python creates a client object for the Google Cloud Storage API. The Client() class is used to interact with the Storage API.

The Client() class has methods for creating buckets, uploading blobs, and downloading blobs.

project_id = os.environ["GCP_PROJECT"]

The project_id = os.environ["GCP_PROJECT"] statement in Python gets the project ID from the environment variable GCP_PROJECT. The os.environ module provides functions for accessing environment variables.

Similar to what was done in the previous lab, in this lab, the project_id stores the GCP_PROJECT value for future use.

Next is to process images,

The following function reads an uploaded image file from Cloud Storage and calls a function to detect whether the image contains text:

def process_image(file, context):
"""Cloud Function triggered by Cloud Storage when a file is changed.
Args:
file (dict): Metadata of the changed file, provided by the triggering
Cloud Storage event.
context (google.cloud.functions.Context): Metadata of triggering event.
Returns:
None; the output is written to stdout and Stackdriver Logging
"""
bucket = validate_message(file, "bucket")
name = validate_message(file, "name")
detect_text(bucket, name)
print("File {} processed.".format(file["name"]))

in the previous code snippet ( process_images ), a function called validate_message is used.

def validate_message(message: Dict[str, T], param: str) -> T:

"""
Placeholder function for validating message parts.

Args:
message: message to be validated.
param: name of the message parameter to be validated.

Returns:
The value of message['param'] if it's valid. Throws ValueError
if it's not valid.
"""

var = message.get(param)
if not var:
raise ValueError(
"{} is not provided. Make sure you have \
property {} in the request".format(
param, param
)
)
return var

As the validate_message suggests, the bucket's inner structure is similar to a dictionary in Python. To validate message which has a dictionary structure, this code confirms the existence of the values for keys such as bucket and name.

The second function that is used in process_image is detect_text .

def detect_text(bucket, filename):
print("Looking for text in image {}".format(filename))
futures = []
image = vision.Image(
source=vision.ImageSource(gcs_image_uri=f"gs://{bucket}/{filename}")
)
text_detection_response = vision_client.text_detection(image=image)
annotations = text_detection_response.text_annotations
if len(annotations) > 0:
text = annotations[0].description
else:
text = ""
print("Extracted text {} from image ({} chars).".format(text, len(text)))
detect_language_response = translate_client.detect_language(text)
src_lang = detect_language_response["language"]
print("Detected language {} for text {}.".format(src_lang, text))
# Submit a message to the bus for each target language
to_langs = os.environ["TO_LANG"].split(",")
for target_lang in to_langs:
topic_name = os.environ["TRANSLATE_TOPIC"]
if src_lang == target_lang or src_lang == "und":
topic_name = os.environ["RESULT_TOPIC"]
message = {
"text": text,
"filename": filename,
"lang": target_lang,
"src_lang": src_lang,
}
message_data = json.dumps(message).encode("utf-8")
topic_path = publisher.topic_path(project_id, topic_name)
future = publisher.publish(topic_path, data=message_data)
futures.append(future)
for future in futures:
future.result()

The code is a Python function that detects text in an image and publishes the text to a Pub/Sub topic for each target language.

The function takes two parameters:

  • bucket: The name of the bucket that the image is stored in.
  • filename: The name of the image file.

The function first uses the Google Cloud Vision API to detect text in the image. If text is detected, the function extracts the text and stores it in the variable text. The function then uses the Google Cloud Translate API to detect the language of the text. The language of the text is stored in the variable src_lang.

The function then iterates over the list of target languages, which is stored in the environment variable TO_LANG. The function publishes a message for each target language to a Pub/Sub topic. The message contains the text, the filename, the target language, and the source language.

The function uses the following Cloud APIs:

  • Google Cloud Vision API: To detect text in the image.
  • Google Cloud Translate API: To detect the language of the text.
  • Google Cloud Pub/Sub API: To publish messages to a Pub/Sub topic.

Initially, in main.py Google Vision API is imported ( from google.cloud import vision). Here vision.Image is used to load an Image from the bucket. Then the vision_client.text_detection() method calls the text detection function and applies it to the image . The annotations variable stores the results of the text detection.

if any text is detected, the detected text will be stored in the variable text . Then the translate_client.detect_language() method calls the language detection function. The src_lang variable stores the results of the language detection function. Essentially, it determines the language that was most probable in the detected text( text).

For translation, the for loop iterates over the list of target languages. This list is generated as an input argument and is available to the code through the environment variable TO_LANG . For instance, at the time of deployment, TO_LANG = es, en, fr, ja .

The topic_name variable stores the name of the Pub/Sub topic to publish the message.topic_name is provided to the code through another environmental variable called TRANSLATE_TOPIC . In case the target language and source language are similar or the source language is undefined, topic_name is determined to be theRESULT_TOPIC . Two distinct topics were created when creating the topic names for the translation and result. Here those topics are utilized (task two).

For each target language, the function creates a message that contains the text, the filename, the target language, and the source language. The function then publishes the message to a Pub/Sub topic. The message_data variable stores the message data in JSON format encoded in utf-8 format. Then, the publisher.topic_path() method creates a topic path based on the project_id , and topic_name . The publisher.publish() method publishes the message to the topic and stores the result in the variable future . In this section, the code submits a message to the bus for each target language. futures is a list of asynchronous publish request statuses. The future.result() method waits for the publish operation to complete.

The following command must be executed in the sample code's directory to deploy the image processing function with a Cloud Storage trigger.

gcloud functions deploy ocr-extract \
--runtime python39 \
--trigger-bucket YOUR_IMAGE_BUCKET_NAME \
--entry-point process_image \
--set-env-vars "^:^GCP_PROJECT=YOUR_GCP_PROJECT_ID:TRANSLATE_TOPIC=YOUR_TRANSLATE_TOPIC_NAME:RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME:TO_LANG=es,en,fr,ja"

Here gcloud functions deploy create or update a Google Cloud Function. ocr-extract is the name of the function that is employed. The function is written in Python 3.9 and is triggered when an image is uploaded to the bucket YOUR_IMAGE_BUCKET_NAME.The function’s entry point is the function process_image(). The --set-env-vars flag sets the following environment variables:

  • GCP_PROJECT: The ID of the Google Cloud project that the function is running in.
  • TRANSLATE_TOPIC: The name of the Pub/Sub topic to which the function publishes translated text.
  • RESULT_TOPIC: The name of the Pub/Sub topic to which the function publishes the results of the OCR.
  • TO_LANG: A comma-separated list of languages to translate the text to.

To translate the text, the following function is utilized to extract text and queues the translated text to be saved back to Cloud Storage.

def translate_text(event, context):
if event.get("data"):
message_data = base64.b64decode(event["data"]).decode("utf-8")
message = json.loads(message_data)
else:
raise ValueError("Data sector is missing in the Pub/Sub message.")
text = validate_message(message, "text")
filename = validate_message(message, "filename")
target_lang = validate_message(message, "lang")
src_lang = validate_message(message, "src_lang")
print("Translating text into {}.".format(target_lang))
translated_text = translate_client.translate(
text, target_language=target_lang, source_language=src_lang
)
topic_name = os.environ["RESULT_TOPIC"]
message = {
"text": translated_text["translatedText"],
"filename": filename,
"lang": target_lang,
}
message_data = json.dumps(message).encode("utf-8")
topic_path = publisher.topic_path(project_id, topic_name)
future = publisher.publish(topic_path, data=message_data)
future.result()

The function translate_text() takes two parameters:

  • event: The Pub/Sub event that triggered the function.
  • context: The Cloud Function context.

The function first checks if the Pub/Sub event has a data field. If it does, the function decodes the data field and loads it as a JSON object. The function then extracts the text, filename, target language, and source language from the JSON object.

The function then uses the Google Cloud Translate API to translate the text into the target language. The translate_client.translate() method calls the Translate API to translate the text. The translated text is stored in the variable translated_text. The function then creates a message ( message ) that contains the translated text, the filename, and the target language. The message_data variable stores the message data in JSON format. Then, the publisher.topic_path() method creates a topic path, and publisher.publish() method publishes the message to the topic.

The function finally waits for the publish operation to complete ( future.result()).

The following command in the directory containing the sample code must be executed to deploy the text translation function with a Cloud Pub/Sub trigger.

gcloud functions deploy ocr-translate \
--runtime python39 \
--trigger-topic YOUR_TRANSLATE_TOPIC_NAME \
--entry-point translate_text \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_TOPIC=YOUR_RESULT_TOPIC_NAME"

Here the function ocr-translate is created. The difference between this command and the previously explained command is that here, the entry-point is the translate_text function that was explained.

Finally, to save the translated text in the Cloud Storage, the following command is utilized.

def save_result(event, context):
if event.get("data"):
message_data = base64.b64decode(event["data"]).decode("utf-8")
message = json.loads(message_data)
else:
raise ValueError("Data sector is missing in the Pub/Sub message.")
text = validate_message(message, "text")
filename = validate_message(message, "filename")
lang = validate_message(message, "lang")
print("Received request to save file {}.".format(filename))
bucket_name = os.environ["RESULT_BUCKET"]
result_filename = "{}_{}.txt".format(filename, lang)
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(result_filename)
print("Saving result to {} in bucket {}.".format(result_filename, bucket_name))
blob.upload_from_string(text)
print("File saved.")

The function save_result() takes two parameters:

  • event: The Pub/Sub event that triggered the function.
  • context: The Cloud Function context.

The function first checks if the Pub/Sub event has a data field. If it does, the function decodes the data field using base64.b64decode() method. Then it loads the data as a JSON object through json.loads() method.

The function then extracts the text, filename, and language from the JSON object. If any of these parameters do not exist, the validate_message will raise a ValueError.

Then the code extracts the bucket_name which was input into the code through environment variables. The file's filename is the concatenation of the filename, the language, and the extension. .txt.

The function then saves the text to a Cloud Storage bucket. The line of code bucket = storage_client.get_bucket(bucket_name) gets a Cloud Storage bucket object by name. The get_bucket() method returns a bucket object if the bucket exists. If the bucket does not exist, the method raises an exception.

Blob is a data storage format in Google Cloud Storage. A blob is a collection of data stored as a single entity and can be accessed using the Google Cloud Storage client.

Blob(
name,
bucket,
chunk_size=None,
encryption_key=None,
kms_key_name=None,
generation=None,
)

The blob() method gets the blob object. The line of code blob.upload_from_string(text) uploads the string text to the blob object blob.

The function finally prints a message indicating that the file has been saved.

As explained previously, to deploy the function that saves results to Cloud Storage with a Cloud Pub/Sub trigger, the following command in the directory that contains the sample code must be executed.

gcloud functions deploy ocr-save \
--runtime python39 \
--trigger-topic YOUR_RESULT_TOPIC_NAME \
--entry-point save_result \
--set-env-vars "GCP_PROJECT=YOUR_GCP_PROJECT_ID,RESULT_BUCKET=YOUR_RESULT_BUCKET_NAME"

Task 5. Upload an image

To upload an image to the GCP, the following command must be executed.

gsutil cp PATH_TO_IMAGE gs://YOUR_IMAGE_BUCKET_NAME

This command copies the file at the path PATH_TO_IMAGE to the Cloud Storage bucket named YOUR_IMAGE_BUCKET_NAME.

To ascertain the execution of this function, user can study the logs through the following code.

gcloud functions logs read --limit 100

This command reads the most recent 100 log entries for all Cloud Functions in the current project.

Upon correct execution, the saved translations in the Cloud Storage bucket can be found in the Cloud Storage bucket.

Task 6. Delete the Cloud Functions

Deleting Cloud Functions does not remove any resources stored in Cloud Storage. However, to delete the Cloud Functions that were created, the following commands must be executed.

gcloud functions delete ocr-extract
gcloud functions delete ocr-translate
gcloud functions delete ocr-save

Disclaimer: To write this post, various tools have been utilized, including but not limited to Bard, ChatGPT, and Grammarly.

--

--

Roya

Enthusiastic young professional with a love for science, women's rights, and photography. Roya means "a sweet dream" in Farsi.