Member-only story

Deep Dive in Extracting Text from the Images using the Google Cloud Vision API

Roya
13 min readAug 23, 2023

--

This is the second lab from the series of deep dive into Computer Vision Fundamentals with Google Cloud Labs. The first lab can be found here.

Photo by Kina on Unsplash

Task 1. Visualize the flow of data

The following steps must be executed to Extract Text from the Images using the Google Cloud Vision API.

  1. An image that contains text in any language is uploaded to Cloud Storage. This is done in Task 5.
  2. A Cloud Function is triggered, which uses the Vision API to extract the text and detect the source language. (Cloud Function ocr-extract, Python Function process_image)
  3. The text is queued for translation by publishing a message to a Pub/Sub topic. A translation is queued for each target language different from the source language. (Python function process_image)
  4. If a target language matches the source language, the translation queue is skipped, and text is sent to the result queue, another Pub/Sub topic. ( Python function detect_text)
  5. A Cloud Function uses the Translation API to translate the text in the translation queue. The translated result is sent to the result queue. (Cloud Function ocr-translate, Python function translate_text)
  6. Another Cloud Function saves…

--

--

Roya
Roya

Written by Roya

A research scientist focused on responsible AI, with a love for nature, history, women's rights, and photography. Roya means "a sweet dream" in Farsi.

No responses yet