Building a Serverless Image Text Extractor and Translator Using Google Cloud Pre-Trained AI

Published in

Google Cloud - Community

28 min readJun 28, 2024

Introduction

Those that follow my Medium blogs will know that I write quite a bit about architecture, strategy, and Google Cloud. And occasionally I post about Python.

But rarely do I actually build useful applications in Google Cloud. And so I thought it was about time I did a blog about an end-to-end development experience on Google Cloud.

In this blog I’ll talk about building a serverless AI application which takes a user-uploaded image, extracts any texts it finds, and — if necessary — translates it. I’ll be making use of:

Cloud Run to host the UI, in the form of a Python Flask application.
Cloud Functions, to host the backend logic in response to the user uploading an image.
Google’s pre-built Image and Translation AI Machine Learning APIs.
Local development using Visual Studio Code, along with functions-framework for local Cloud Functions development, and Cloud Code for local Cloud Run development.

A Quick Overview of AI and AI Products

Artificial Intelligence

AI is a broad term that describes making use of machine automation to perform a task that normally requires human intelligence. E.g. speech recognition, visual perception, language translation, decision making.

Machine Learning

A specific subfield of AI, related to teaching machines to recognise patterns in data, and being able to make predictions and solve problems without explicit coding solutions.

Generative AI (Gen AI)

A subclass of AI that is able to generate new data that is similar to — but not the same as — the data it was trained on. Gen AI relies on foundation models, such as large language models (LLMs), generative image models, and multimodal models. A multimodal model is a type of model that is able to process multiple types of input data (e.g. text, image and video), and generate multiple types of content.

Multimodal Gen AI — such as Google Gemini Pro

My solution won’t be using Gen AI. But I’ve mentioned it here since Gen AI is so prevalent, and I wanted to make sure you understand how these models differ from the predictive models I’m using in my solution.

Google’s Pre-Trained Machine Learning APIs

These are ML models that have been pre-built and trained by Google to perform particular tasks. They are classified as predictive models, rather than generative. Examples include:

Google Cloud Vision API — For tasks such as: classification, facial recognition, and text detection.
Google Cloud Natural Language API — For understanding the meaning behind text. This includes identifying important elements of text, and also sentiment analysis.
Google Cloud Translation API — For translation from one language to antoher.
Google Cloud Video — For video analysis and annotation.

Motivation for the Application

I’ve been learning Ukrainian for a little while. It’s a beautiful language. I started by listening to the Ukrainian Lessons Podcast, created by Anna Ohoiko. From there, I discovered an active and thriving Ukrainian Learners community on Facebook. Occasionally, community members will post a meme in Ukrainian. Sometimes I’d understand the meme. Often I wouldn’t.

And so I thought… “I’d like an application where I can upload a meme (or any image), extract the text from it, and translate that text into my native language.”

Of course, there are other ways to do this. But I thought it was a great use case for building a new, serverless application in Google Cloud, from scratch.

Application Architecture

It’s a pretty simple application architecture, hosted on Google Cloud serverless services:

Architecture for the Image-Text-Translator Application

The Webapp UI — Cloud Run

Here I run a Flask web application as a container, on Google Cloud Run. The web application does a couple of things:

It renders the frontend page containing the form used to capture the user input (e.g. language and image upload).
It handles the request from the user, capturing the image, and sending it to the Cloud Function backend.

I’ve selected Cloud Run because:

It provides a serverless way to host and run a containerised application. (I.e. our Python Flask web application.)
It is well-suited for hosting simple stateless web applications, like this one.
It automatically scales, and scales down to 0 instances, when there is no demand.

The Backend — Cloud Function

The Cloud Function receives the image from the user (via the web application UI) and then calls the respective Google Cloud APIs, in order to extract text from the image, and translate it.

I’ve selected Cloud Functions because:

It is well-suited to performing short-lived processing, in response to events. Thus, ideal for running our extraction and translation task, in response to the user-uploaded image.
It automatically scales, and scales down to 0 when there is no demand.
We can decouple the image processing from the actual user frontend. Thus, if we wanted to use a different frontend, we could easily do so, without changing the code in the Cloud Function.

Text Extraction and Translation

I’m using Google’s pre-built Vision API and Translation API. But why not use a Generative AI model, like Gemini Pro Vision?

The Vision and Translation APIs are specifically built for the tasks I want to perform.
Conversely, the Gemini Pro Gen AI multimodal foundation model can achieve the same result, in response to natural language prompts. However, we have no need for natural language interactions here. Why? Because we know exactly what we want the APIs to do in response to an image upload.
Although Gemino Pro Vision has more versatility as a multimodal foundation model, this power comes with a higher price tag. The Vision API allows 1000 free invocations per month; and the Translate API provides free translation of the first half million characters, per month.

My Dev Environment

WSL

I’m running Windows, with Windows Subsystem for Linux (WSL). For those not familiar with WSL, it is an out-of-the-box environment (included in Windows 10 and later) that allows the running of a full Linux environment, directly inside Windows. I happen to be using Ubuntu.

The advantage of working inside WSL is that I can write any necessary scripts in bash, which means my code will be more portable. For example, I can run the same scripts inside my own environment as I would inside Google Cloud Shell.

Visual Studio Code

VS Code is my code editor of choice. It is free and open source. It runs on Windows, Linux and Mac, and has many useful plugins, such as Git integration, and Google Cloud Code: a set of AI-assisted plugins (including Gemini Code Assist) for facilitating local development with Google Cloud services.

Dev Project Structure

If you want to check out the git repo, you can find it here.

The overall structure looks like this:

└── image-text-translator
    ├── docs/                   - Documentation for the repo
    |
    ├── infra-tf/               - Terraform for installing infra
    |
    ├── scripts/                - For environment setup and helper scripts
    |   └── setup.sh            - Setup helper script
    |
    ├── app/                    - The Application
    │   ├── ui_cr/                - Browser UI (Cloud Run)
    │   │   ├── static/             - Static content for frontend
    |   |   ├── templates/          - HTML templates for frontend
    |   |   ├── app.py              - The Flask application
    |   |   ├── requirements.txt    - The UI Python requirements
    |   |   ├── Dockerfile             - Dockerfile to build the Flask container
    |   |   └── .dockerignore          - Files to ignore in Dockerfile
    |   |
    │   └── backend_gcf/          - Backend (Cloud Function)
    │       ├── main.py             - The backend CF application
    │       └── requirements.txt    - The backend CF Python requirements
    |
    ├── testing/
    │   └── images/
    |
    ├── requirements.txt          - Python requirements for project local dev
    └── README.md

One-Time Google Project Setup and Permissions

Create the Google Cloud Project

Create a Google Cloud project for your application. This is mine:

Perform the next steps from an account that has sufficient privileges, such as a Project Admin or Org Admin.

Enable APIs

Eventually, we’ll Terraform this configuration. But initially, these are the APIs you’ll need to enable:

# Authenticate to Google Cloud
gcloud auth list

# Check we have the correct project selected
export PROJECT_ID=<enter your project ID>
gcloud config set project $PROJECT_ID

# Enable Cloud Build API
gcloud services enable cloudbuild.googleapis.com

# Enable Cloud Storage API
gcloud services enable storage-api.googleapis.com

# Enable Artifact Registry API
gcloud services enable artifactregistry.googleapis.com

# Enable Eventarc API
gcloud services enable eventarc.googleapis.com

# Enable Cloud Run Admin API
gcloud services enable run.googleapis.com

# Enable Cloud Logging API
gcloud services enable logging.googleapis.com

# Enable Cloud Pub/Sub API
gcloud services enable pubsub.googleapis.com

# Enable Cloud Functions API
gcloud services enable cloudfunctions.googleapis.com

# Enable Cloud Translation API
gcloud services enable translate.googleapis.com

# Enable Cloud Vision API
gcloud services enable vision.googleapis.com

# Enable Service Account Credentials API
gcloud services enable iamcredentials.googleapis.com

Service Account and Roles

Service accounts are the standard approach to managing authentication and authorisation for applications, rather than end users. Our Cloud Run application will need to authenticate to our Cloud Function, and our Cloud Function will need to authenticate to the Cloud Cloud Vision and Translation APIs.

So let’s create a service account:

# Make sure your PROJECT_ID variable is set before doing this!
export SVC_ACCOUNT=image-text-translator-sa
export SVC_ACCOUNT_EMAIL=$SVC_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com

# Attaching a user-managed service account is the preferred way to provide credentials to ADC for production code running on Google Cloud.
gcloud iam service-accounts create $SVC_ACCOUNT

Now we’ll bind a number of roles to our service account:

# Grant roles to the service account
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
  --role=roles/run.admin

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
  --role=roles/run.invoker

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
  --role=roles/cloudfunctions.admin

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
  --role=roles/cloudfunctions.invoker

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
  --role="roles/cloudtranslate.user"

# Grant the required role to the principal that will attach the service account to other resources.
gcloud iam service-accounts add-iam-policy-binding $SVC_ACCOUNT_EMAIL \
  --member="group:gcp-devops@my-org.com" \
  --role=roles/iam.serviceAccountUser

# Allow service account impersonation
gcloud iam service-accounts add-iam-policy-binding $SVC_ACCOUNT_EMAIL \
  --member="group:gcp-devops@my-org.com" \
  --role=roles/iam.serviceAccountTokenCreator

# Ensure your account has access to deploy to Cloud Functions and Cloud Run
gcloud projects add-iam-policy-binding $PROJECT_ID \
   --member="group:gcp-devops@my-org.com" \
   --role roles/run.admin

Local Development Environment Setup

(You will need to follow these steps on any machine where you plan to do development.)

Open a terminal. (I’m opening an Ubuntu Shell from Windows Terminal.) If you haven’t done so already, you’ll want to install Google gcloud CLI and supporting tools into your local environment:

# Install Google Cloud CLI in your local Linux environment.
# See https://cloud.google.com/sdk/docs/install

# Setup Python and pip in Gcloud CLI
# See https://cloud.google.com/python/docs/setup

# Install additional Google Cloud CLI packages for local dev
sudo apt install google-cloud-cli-gke-gcloud-auth-plugin kubectl google-cloud-cli-skaffold google-cloud-cli-minikube

Here (and from now on), we will authenticate with our developer account, rather than an org admin account. Why? I’m following the principle of least privilege.

# Authenticate to Google Cloud
gcloud auth login

When authenticating, click on the first link that is shown. You’ll then be prompted to provide your password.

Next, we’ll setup our application project folder and install some dependencies. If you’re following along and building the application from scratch, these are the next steps:

# This is where I keep my project
cd ~/localdev/gcp/image-text-translator

# Create a Python virtual env. For example...
python3 -m venv .venv
# And now ACTIVATE it

# Add Python packages we need...
python3 -m pip install Flask
python3 -m pip install pillow # For image handling
python3 -m pip install functions-framework
python3 -m pip install google-cloud-storage google-cloud-translate google-cloud-vision google-auth

# And create the requirements.txt file
python3 -m pip freeze > requirements.txt

Alternatively, if you want to clone my git repo:

git clone https://github.com/derailed-dash/image-text-translator.git

cd image-text-translator

# Create a Python virtual env. For example...
python3 -m venv .venv
# And now ACTIVATE it. E.g.
source .venv/bin/activate

# Install the Python dependencies now
python3 -m pip install -r requirements.txt

Git Setup

If you’re building everything from scratch (without cloning my repo) you should now setup your Git and GitHub environment. Don’t forget to first create .gitignore. Check out my repo to get an idea of what it should look like. Then follow these steps:

# Setup git in Cloud Shell, if you haven't done so before
git config --global user.email "bob@wherever.com"
git config --global user.name "Bob"
git config --global core.autocrlf input # really important if you're using WSL!

# Create local git repo.
# Before proceeding, make sure you have created .gitignore file 
# to ignore .terraform dirs and local state, plans, etc.
git init
git add .
git commit -m "Initial commit"

# Let's authenticate the GitHub command line tool
# It is already installed on Cloud Shell
gh auth login

# Now let's use gh cli to create a remote repo in GitHub.
# You can make it private, if you prefer
gh repo create image-text-translator --public --source=.
git push -u origin master

Running VS Code

Let’s open VS Code from our project folder:

# From /path/to/your/image-text-translator
code .

VS Code is clever enough to configure any necessary WSL plugins required.

VS Code knows it is working in the WSL environment

Setup Application Default Credentials (ADC)

ADC is a strategy that allows authentication libraries to automatically find credentials based on the current environment. This is useful because we can leverage ADC both in our local environment (with the Cloud SDK), but also in our target environment on Google Cloud.

ADC can be configured to use our service account credentials. There are two ways to do this:

We can impersonate the service account using our own user identity.
We can create a private key for our service account, and point ADC to the location of this key.

I tried to use impersonation initially:

gcloud auth application-default login --impersonate-service-account $SVC_ACCOUNT_EMAIL

Unfortunately, I struggled to make this work when authenticating my Cloud Function call from Cloud Run. So instead, I’ve created a service account key which we can point the ADC to:

gcloud auth application-default login

# If these are not already set...
export SVC_ACCOUNT=image-text-translator-sa
export SVC_ACCOUNT_EMAIL=$SVC_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com

# Create a service account key for local dev
gcloud iam service-accounts keys create ~/.config/gcloud/$SVC_ACCOUNT.json \
  --iam-account=$SVC_ACCOUNT_EMAIL

# Configure the ADC environment variable
# which is automatically detected by client libraries
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/$SVC_ACCOUNT.json

We actually need to set this GOOGLE_APPLICATION_CREDENTIALS environment variable with each session. Which brings us to…

Setup for Every Session

You’ll need to run these commands with EVERY new terminal session. (Or you can run the command source <project_dir>/scripts/setup.sh to run these commands.)

export PROJECT_ID=$(gcloud config list --format='value(core.project)')
export REGION=europe-west4
export SVC_ACCOUNT=image-text-translator-sa
export SVC_ACCOUNT_EMAIL=$SVC_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/$SVC_ACCOUNT.json

# Functions
export FUNCTIONS_PORT=8081
export BACKEND_GCF=https://$REGION-$PROJECT_ID.cloudfunctions.net/extract-and-translate

# Flask
export FLASK_SECRET_KEY=some-secret-1234
export FLASK_RUN_PORT=8080

echo "Environment variables configured:"
echo PROJECT_ID="$PROJECT_ID"
echo REGION="$REGION"
echo SVC_ACCOUNT_EMAIL="$SVC_ACCOUNT_EMAIL"
echo BACKEND_GCF="$BACKEND_GCF"
echo FUNCTIONS_PORT="$FUNCTIONS_PORT"
echo FLASK_RUN_PORT="$FLASK_RUN_PORT"

The Cloud Function Backend

Local Development of the Function

I wont reproduce all the code, since you can check it out in GitHub. I’ll just point out a few key things.

In my backend-gcf folder, I create a requirements.txt. This is to define the Python packages that must be installed. Cloud Functions will automatically install these packages, when you deploy the function.

Then I create a main.py. Here is part of the function that acts as the entry point to our Cloud Function: extract_and_translate().

@functions_framework.http
def extract_and_translate(request):
    """Extract and translate the text from an image.
    The image can be POSTed in the request, or it can be a GCS object reference.
    
    If a POSTed image, enctype should be multipart/form-data and the file should be named 'uploaded'.
    If we're passing a GCS object reference, content-type should be 'application/json', 
    with two attributes:
    - bucket: name of GCS bucket in which the file is stored.
    - filename: name of the file to be read.
    """

    # Check if the request method is POST
    if request.method == 'POST':
        # Get the uploaded file from the request
        uploaded = request.files.get('uploaded')  # Assuming the input filename is 'uploaded'
        to_lang = request.form.get('to_lang', "en")
        print(f"{uploaded=}, {to_lang=}")
        if not uploaded:
            return flask.jsonify({"error": "No file uploaded."}), 400

        if uploaded: # Process the uploaded file
            file_contents = uploaded.read()  # Read the file contents
            image = vision.Image(content=file_contents)
        else:
            return flask.jsonify({"error": "Unable to read uploaded file."}), 400

It’s pretty self-explanatory.

We check to see if the function has received a POST. (Later, we’ll create our Cloud Run application to POST the request.)
If so, we look in the request for an object called uploaded. (Our Cloud Run application will attach this to the request.)
If we find this object attached, we read it as binary, and then use this to create vision.Image object.
Next we call the detect_text() function, passing in the image. This function will use the Vision API to see if there is any text in the image.

 # Use the Vision API to extract text from the image
 detected = detect_text(image)
 if detected:
    translated = translate_text(detected, to_lang)
    if translated["text"] != "":
        return translated["text"]

If so, it will return a Python dictionary containing this text. It will then pass this text into the translate_text() function, to translate the text into our chosen language.

Here is the detect_text() function:

def detect_text(image: vision.Image) -> dict | None:
    """Extract the text from the Image object """
    text_detection_response = vision_client.text_detection(image=image)
    annotations = text_detection_response.text_annotations

    if annotations:
        text = annotations[0].description
    else:
        text = ""
    print(f"Extracted text from image:\n{text}")

    # Returns language identifer in ISO 639-1 format. E.g. en.
    # See https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes
    detect_language_response = translate_client.detect_language(text)
    src_lang = detect_language_response["language"]
    print(f"Detected language: {src_lang}.")

    message = {
        "text": text,
        "src_lang": src_lang,
    }

    return message

This code not only detects any text in the image, but also uses the Google Language API to determine the language of the text.

Next, the translate_text() function:

def translate_text(message: dict, to_lang: str) -> dict:
    """
    Translates the text in the message from the specified source language
    to the requested target language, then sends a message requesting another
    service save the result.
    """

    text = message["text"]
    src_lang = message["src_lang"]

    translated = { # before translating
        "text": text,
        "src_lang": src_lang,
        "to_lang": to_lang,
    }
    
    if src_lang != to_lang and src_lang != "und":
        print(f"Translating text into {to_lang}.")
        translated_text = translate_client.translate(
                text, target_language=to_lang, source_language=src_lang)

        translated = {
            "text": unescape(translated_text["translatedText"]),
            "src_lang": src_lang,
            "to_lang": to_lang,
        }
    else:
        print("No translation required.")
    
    return translated

We check that the source language and target language are different, and that the source language is not undefined. If we pass this check, we then then use the Google Translate API to translate the text.

Testing Locally

First, let’s run the function locally. Run this command from our backend-gcf folder:

# Run the function
functions-framework --target extract_and_translate \
  --debug --port $FUNCTIONS_PORT

It should look like this:

I’m going to test with this image:

From a second terminal, let’s POST to the function, using curl:

# You will first need to authenticate and set the environment vars in this terminal
source ./scripts/setup.sh

# now invoke
curl -X POST localhost:$FUNCTIONS_PORT \
   -H "Content-Type: multipart/form-data" \
   -F "uploaded=@./testing/images/ua_meme.jpg" \
   -F "to_lang=en"

And it works!!

Deploy the Cloud Function (to Google Cloud)

Now we’ve tested it locally, we can deploy it to Google Cloud. Again, we must run from our backend-gcf folder.

# From the backend-gcf folder
gcloud functions deploy extract-and-translate \
  --gen2 --max-instances 1 \
  --region $REGION \
  --runtime=python312 --source=. \
  --trigger-http --entry-point=extract_and_translate \
  --no-allow-unauthenticated

# Allow this function to be called by the service account
gcloud functions add-invoker-policy-binding extract-and-translate \
  --region=$REGION \
  --member="serviceAccount:$SVC_ACCOUNT_EMAIL"

Deployment result:

Here’s a cool thing… With the Cloud Code extension in VS Code, we can now see our deployed Cloud Function in Google Cloud!

Viewing our Cloud Function in Google Cloud, from Cloud Code in VS Code

Test the Cloud Function

We just need a slightly different curl command:

curl -X POST https://$REGION-$PROJECT_ID.cloudfunctions.net/extract-and-translate \
    -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
    -H "Content-Type: multipart/form-data" \
    -F "uploaded=@./testing/images/ua_meme.jpg" \
    -F "to_lang=en"

It works!

Let’s test English-to-English with this meme:

Now I’ll try and translate to Ukrainian:

Translating to Ukrainian

Woop! It works! And now to French:

Hurrah! All the tests appear to be working.

Note: if you try to use the function without passing in authenticated and authorised credentials, you’ll see this:

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/extract-and-translate</code> from this server.</h2>
<h2></h2>
</body></html>

Deploying a New Version

If we update our code and want to redeploy, we can just re-run the same deploycommand.

Deleting the Function

If you want to delete the Cloud Funtion, you can just run this:

gcloud functions delete extract-and-translate --region=$REGION

Flask UI

We will create a simple Flask Python web application using Cloud Run. This application will render the form page, and process the form. Then, after retrieving the form responses (including the uploaded image), it will call our Cloud Function.

Creating the Flask Web Application

This article is not intended to be a Flask tutorial. And so my recommendation is to check out the code in the ui_cr folder of the GitHub repo.

But here are some key points:

I have created a requirements.txt to define the Python packages required by this application.
I have created a Dockerfile which is responsible for packaging our Flask application into a Docker container. (We’ll need this, to deploy to Cloud Run later.) It does so by copying the contents of the ui_cr folder, installing the Python dependencies (as defined in requirements.txt), and then defining the entry point for the application. I.e. to run python app.py.

Our ui_cr folder has this structure:

├── ui_cr/                - Browser UI (Cloud Run)
    ├── static/             - Static content for frontend
    ├── templates/          - HTML templates for frontend
    ├── app.py              - The Flask application
    ├── requirements.txt    - The UI Python requirements
    ├── Dockerfile             - Dockerfile to build the Flask container
    └── .dockerignore          - Files to ignore in Dockerfile

The Application Code

I’m not going to go through the code in detail. But I’ll highlight some interesting points and gotchas.

Let’s take a look at app.py. First, the function that instantiates the Flask application:

def create_app():
    """ Create and configure the app """
    flask_app = Flask(__name__, instance_relative_config=True)
    flask_app.config.from_mapping(
        SECRET_KEY='dev', # override with FLASK_SECRET_KEY env var
    )

    # Load envs starting with FLASK_
    # E.g. FLASK_SECRET_KEY, FLASK_PORT
    flask_app.config.from_prefixed_env()
    client = translate.Client()
    flask_app.languages = {lang['language']: lang['name'] for lang in client.get_languages()}
    flask_app.backend_func = os.environ.get('BACKEND_GCF', 'undefined')
    return flask_app

app = create_app()

The Flask app requires a secret key in order to manage sessions. We can pass a key to the application using an environment variable, FLASK_SECRET_KEY .
I’m using the Google Translate API to retrieve a list of available languages to translate to. I’ll use this to populate a drop-down select, in my form.
We need to pass the URL of our target function. Again, I’ll use an environment variable for this: BACKEND_GCF.

Handling Requests to the Flask Home Page

Here we handle requests to /. When our user first visits the page, they will be sending a GET request. But when they submit the form with an image to translate, the request will be received as a POST. So we need to handle both.

@app.route('/', methods=['GET', 'POST'])
def entry():
    """ Render the upload form """
    message = "Upload your image!"
    to_lang = os.environ.get('TO_LANG', 'en')
    encoded_img = ""
    translation = ""

    if request.method == 'POST': # Form has been posted
        app.logger.debug("Got POST")
        file = request.files.get('file')
        to_lang = request.form.get('to_lang')

        if file is None:
            flash('No file part.')
        elif file.filename == '':
            flash('No file selected for uploading.')
        elif not allowed_file(file.filename):
            filename = secure_filename(file.filename)
            flash(f'{secure_filename(filename)} is not a supported image format. '
                  f'Supported formats are: {ALLOWED_EXTENSIONS}')
        else:
            filename = secure_filename(file.filename)
            app.logger.debug("Got %s", filename)
            app.logger.debug("Translating to %s", to_lang)

            # We don't need to save the image. We just want to binary encode it.
            try:
                img = Image.open(file.stream)
                with BytesIO() as buf:
                    if img_format := img.format: # e.g. JPEG, GIF, PNG
                        img.save(buf, img_format.lower())
                        content_type = f"image/{img_format.lower()}"
                        image_bytes = buf.getvalue()
                        encoded_img = base64.b64encode(image_bytes).decode()
                    else:
                        flash('Unable to determine image format.')
            except UnidentifiedImageError:
                # This will happen if we resubmit the form
                flash('Unable to process image.')

            if encoded_img:
                message = f"Processed <{secure_filename(filename)}>. Feel free to upload a new image."
                func_response = make_authorized_post_request(endpoint=app.backend_func,
                                        image_data=image_bytes, to_lang=to_lang,
                                        filename=filename, content_type=content_type)
                app.logger.debug("Function response code: %s", func_response.status_code)
                app.logger.debug("Function response text: %s", func_response.text)
                translation = func_response.text

    return render_template('index.html',
                           languages=app.languages,
                           message=message,
                           to_lang=to_lang,
                           img_data=encoded_img,
                           translation=translation), 200

We validate the input, and then — if the image has been uploaded and is a valid image — we make a request to our Cloud Function. This request needs to pass the raw bytes of the uploaded image.

Also, I want to be able to display the uploaded image to the user, in the returned page. I want to avoid saving the uploaded image on disk in the backend, so I’m taking this approach:

Converting the uploaded image into an in-memory BytesIO object.
Converting the in-memory object to JPEG.
Retrieving the raw byte data of the image in the buffer. (This is also the raw data I will send to the make_authorised_post_request() function.)
Encoding the JPEG binary image data using Base64 encoding, a string representation that can be safely sent back to the browser.

Making an Authenticated Call to Our Function

When calling the Cloud Function from our Cloud Run Flask application, we need to include the service account access token in the request headers. The Google client libraries will automatically retrieve the credentials from the ADC.

def make_authorized_post_request(endpoint:str,
                                 image_data, to_lang:str,
                                 filename:str, content_type:str):
    """
    Make a POST request to the specified HTTP endpoint by authenticating with the ID token
    obtained from the google-auth client library using the specified audience value.
    Expects the image_data to be a bytes representation of the image.
    """
    if endpoint == "undefined":
        raise ValueError("Unable to retrieve Function endpoint.")

    # Cloud Functions uses your function's URL as the `audience` value
    # For Cloud Functions, `endpoint` and `audience` should be equal
    # ADC requires valid service account credentials
    audience = endpoint
    auth_req = GoogleAuthRequest()

    # Requests OAuth 2.0 access token for the service identity
    # from the instance metadata server or with local ADC. E.g.
    # export GOOGLE_APPLICATION_CREDENTIALS=/path/to/svc_account.json
    id_token = google.oauth2.id_token.fetch_id_token(auth_req, audience)

    headers = {
        "Authorization": f"Bearer {id_token}",
        # "Content-Type": "multipart/form-data" # Let requests library decide on the content-type
    }

    files = {
        "uploaded": (filename, image_data, content_type),
        "to_lang": (None, to_lang)
    }

    # Send the HTTP POST request to the Cloud Function
    response = requests.post(endpoint, headers=headers, files=files, timeout=10)

    return response

Getting User Input

Flask renders content back to the browser using Jinja2 templates. These are HTML files that contain embedded code. When we call render_template() from our app.py, we pass in a number of variables which are then referenced in the template.

Launching and Debugging

There are a few ways we can launch our Flask app locally:

cd app/ui_cr/
source ../../scripts/setup.sh  # Initialise vars if we're in a new terminal

# Run the Flask App
python app.py

# Or with the Flask command.
# This will automatically load any environment vars starting FLASK_
# The --debug tells Flask to automatically reload after any changes
# and to set the app.logger to debug.
python -m flask --app app run --debug

Also, if you want to use the VS Code interactive debugger, I’d recommend creating a launch configuration that looks something like this:

{
    "configurations": [
        {
            "name": "Python Debugger: Flask",
            "type": "debugpy",
            "request": "launch",
            "module": "flask",
            "cwd": "${workspaceFolder}/app/ui_cr",
            "env": {
                "FLASK_APP": "app.py",
                "FLASK_DEBUG": "1",
                "FLASK_RUN_PORT": "8080"
            },
            "args": [
                "run",
                "--debug",
                "--no-debugger",
                "--no-reload"
            ],
            "jinja": true,
            "autoStartBrowser": false
        },
        // Other configurations
    ]
}

Testing the Application

Okay, we’re ready to run the application!

Here’s what it looks like in the browser:

Let’s use it to translate our Ukrainian meme:

Hurrah!

Deploying to Google Cloud Run

We’re now ready to deploy our Flask application to Cloud Run. Recall that Cloud Run is a serverless container runtime, so we need our application to be packaged as a container image before we can deploy it.

We’ll run through the following steps:

Create a Google Artifact Registry (GAR) repo, for storing our Flask app container image.
Use Cloud Build to build our container image from source and store in GAR.
Deploy our application from GAR to Cloud Run.

gcloud artifacts repositories create image-text-translator-artifacts \
  --repository-format=docker \
  --location=$REGION \
  --project=$PROJECT_ID

You can check the repo has been created in the Cloud Console:

Now build the Docker image:

export IMAGE_NAME=$REGION-docker.pkg.dev/$PROJECT_ID/image-text-translator-artifacts/image-text-translator-ui

# configure Docker to use the Google Cloud CLI to authenticate requests to Artifact Registry.
gcloud auth configure-docker $REGION-docker.pkg.dev

# Build the image and push it to Artifact Registry
# Run from the ui_cr folder
gcloud builds submit --tag $IMAGE_NAME:v0.1 .

It takes a minute or so. And now, our container image has been pushed to the repo:

The container image in Google Artifact Registry

Finally, we can deploy to Cloud Run using our image.

# create a random secret key for our Flask application
export RANDOM_SECRET_KEY=$(openssl rand -base64 32)

gcloud run deploy image-text-translator-ui \
  --image=$IMAGE_NAME:v0.1 \
  --region=$REGION \
  --platform=managed \
  --allow-unauthenticated \
  --max-instances=1 \
  --service-account=$SVC_ACCOUNT \
  --set-env-vars BACKEND_GCF=$BACKEND_GCF,FLASK_SECRET_KEY=$RANDOM_SECRET_KEY

The output looks like this:

We can verify our service has been deployed in the Google Cloud console:

Redeploying

If we want to deploy a new version of our application, we can do so like this:

# Check our IMAGE_NAME is set
export IMAGE_NAME=$REGION-docker.pkg.dev/$PROJECT_ID/image-text-translator-artifacts/image-text-translator-ui
# Set our new version number
export VERSION=v0.2

# Rebuild the container image and push to the GAR
gcloud builds submit --tag $IMAGE_NAME:$VERSION .

# create a random secret key for our Flask application
export RANDOM_SECRET_KEY=$(openssl rand -base64 32)

# Redeploy
gcloud run deploy image-text-translator-ui \
  --image=$IMAGE_NAME:$VERSION \
  --region=$REGION \
  --platform=managed \
  --allow-unauthenticated \
  --max-instances=1 \
  --service-account=$SVC_ACCOUNT \
  --set-env-vars BACKEND_GCF=$BACKEND_GCF,FLASK_SECRET_KEY=$RANDOM_SECRET_KEY

Optionally Setup Custom DNS Mapping

A URL like https://image-text-translator-ui-adisqviovq-ez.a.run.app/ isn’t very memorable! So you might want to map to a custom domain. You can find detailed guidance here.

At this point, I discovered that Cloud Run doesn’t support domain mappings in the region europe-west2. So I ended up redeploying my resources with europe-west4.

Let’s say you’ve created the subdomain image-text-translator.mydomain.com and you want to use this address for the application.

# Verify your domain ownership with Google
gcloud domains verify mydomain.com
# Check it
gcloud domains list-user-verified

# Create a mapping to your domain
gcloud beta run domain-mappings create \
  --region $REGION \
  --service image-text-translator-ui \
  --domain image-text-translator.mydomain.com

Now we need to obtain the DNS records for this domain mapping:

# Obtain the DNS records. We want everything under `resourceRecords`.
gcloud beta run domain-mappings describe \
  --region $REGION \
  --domain image-text-translator.mydomain.com

Take any DNS records that appear under resourceRecords and create these DNS records in your DNS registrar. For me, there was only one CNAME record to add:

It can take quite a bit of time for the DNS records to propagate, and for Google to provision the managed SSL certificate. For me, it took nearly two hours.

Whilst you’re waiting, you can check progress with these tools:

But finally… It’s all working from my domain!

Pricing and Cost Management

At the time of writing…

Google Cloud Functions

Costs are an aggregate of Function invocations, compute used, and network egress.

The first million compute seconds in each month are free, as are the first 2 million function invocations.
The first 400,000 GB-seconds, and the first 200,000 GHz-seconds of compute time are free.
Also, since Cloud Functions scale to 0 when not in use, this can be very cost effective for sporadic or low-utilisation workloads.

Google Cloud Run

Costs are a combination of CPU and memory used.

The first 240000 vCPU-seconds are free per month. After which, compute is charged at $0.00001800 per vCPU-second.
The first 450,000 GiB-seconds are free per month. After which, memory is charged at $0.00000200 / GiB-second
Since Cloud Run scales to 0 when not in use, this can be very cost effective for sporadic or low-utilisation workloads.

Cloud Vision API

The first 1000 text detections per image for free, in each month. After that, it is charged at $1.50 per 1000 images.

Cloud Translate API

The first 0.5 million characters free in each month, for both detect and translate calls. After that, it is charged at $20 per million characters.

Some Cost Control Strategies

You should consider implementing these cost control strategies:

Set up budget alerts in your billing account. If budget thresholds are crossed (e.g. 50%, 75%, 90%, 100%), you will get an email notification. Note: this does not cap your spending. It simply alerts you when thresholds are met.
Limit autoscaling. Whilst Cloud Run and Cloud Functions are both serverless autoscaling services, I don’t anticipate any significant demand for my little application. And consequently, I’ve set both my Cloud Function and Cloud Run service to have max-instances of 1. This means that each service will never deploy more than one concurrent instance.
A more sophisticated strategy: when our budget is exceeded, send a notication to Pub/Sub. Use the Pub/Sub event to trigger a Cloud Function which then detaches projects from the billing account, effectively disabling all resources in the project.

Performance Considerations

One of the drawbacks of a serverless service that scales to 0 is that there will generally be no running instances of the service, if demand is low and sporadic. This is not really an issue for Cloud Functions, which are super-lightweight and startup really quickly. But our Cloud Run Flask application is a little larger, and it can take a few seconds to cold start. And so our users will find that the application is slow when they visit the page.

There are a couple of strategies for dealing with this with Cloud Run:

We can configure the minimum instances to 1. If we do this, our Cloud Run service will never scale to 0. There will always be an instance ready to serve requests. However, this means we’ll be paying for this instance… ALL THE TIME. This is often a great strategy. But for my noddy, low-utilisation application, I don’t really want to do this.
We can configure Cloud Run startup CPU boost. Here Google Cloud dynamically allocates more CPU to our Cloud Run container during startup, which can dramatically improve our startup time. And, because this extra CPU is only allocated for those infrequent cold starts, it’s generally going to be more cost effective than keeping an instance running all the time.

gcloud beta run services update image-text-translator-ui \
  --region=$REGION --cpu-boost

Also, check that your Flask application is not being deployed with debugging enabled. This will definitely hit your startup time!

Some FAQs, General Observations and Tips

Generative AI!

Of course, I need to mention Gen AI again!! Although I haven’t used any Gen AI in the actual solution, I made quite a lot of use of it, to get advice and answers whilst building the solution. In particular, I had quite a few conversations with Gemini Code Assist (which is included as part of the Cloud Code plugin in VS Code), and ChatGPT 4. I estimate that using these tools for problem solving has eliminated maybe 40% of my overall time and effort.

Why Did I Use Both Cloud Run and Cloud Functions?

Why not just include the extract and translate API calls from my Flask application? Then I wouldn’t need Cloud Functions at all.

Two reasons:

I wanted to decouple the extract-and-translate logic from the UI. That way, if I ever wanted to swap out the UI or add another one (like maybe a mobile app), I could still use my Cloud Function without changing it.
I wanted to build an application that required integration of two serverless components. That way, I can demonstrate how one calls the other, but also demonstrate other necessary elements, such as using service accounts.

Wrap-Up

We’re all done! Let’s recap what we’ve achieved:

We created a Google Cloud project to host our application.
We setup a local development project and environment.
We’ve defined a service account and assigned roles to it.
We’ve configured Application Default Credentials, so that our code can find credentials in whatever environment it is deployed to.
We built a Google Cloud Function (in Python) that receives image data, extracts any text from the image and then translates it to any specified language. We’ve done this by calling Google’s pre-made AI APIs.
We’ve tested the Cloud Function locally, using the Functions Framework.
We’ve deployed the Cloud Function to GCP and tested it. It can only be called by authenticated and authorised clients.
We’ve built a Python web user interface application, using Flask.
We’ve built a frontend, using an HTML Jinja template, CSS stylesheet and Javascript.
We’ve configured the Flask Webapp to be able to make authenticated calls to the Function.
After testing the Flask application locally, we’ve packaged the application as a container image, using Google Cloud Build.
We’ve pushed the image to Google Artifact Registry.
We’ve deployed to Cloud Run from the Artifact Registry.
We’ve exposed our Cloud Run service using our own domain name.
We’ve reviewed some cost management best practices and controls.
We’ve implemented cpu-boost to speed up the cold start of Cloud Run.

This was fun! I hope you enjoyed it too!

What’s Next?

I’ll be following-up with a second installment. In Part 2, I’ll cover:

Deploying our Google Cloud infrastructure using Terraform.
Setting up a CI/CD pipeline, to automatically build and deploy changes.
Some other application enhancements.

Before You Go

Please share this with anyone that you think will be interested. It might help them, and it really helps me!
Please give me claps! You know you clap more than once, right?
Feel free to leave a comment 💬.
Follow and subscribe, so you don’t miss my content. Go to my Profile Page, and click on these icons:

Follow and Subscribe

Useful Links

The Application

https://image-text-translator.just2good.co.uk/

Source Code for the Application

Source code repo in GitHub

Dev Setup

Cloud Functions

Setup and Invoke Cloud Functions using Python — including local dev
Functions Framework for Python — for local dev
Cloud Functions — Image Annotation
Cloud Functions — HTTP Triggers
OCR and Translation with CF

Cloud Run

AI/ML APIs

Authentication

How Appication Default Credentials Work
Set up Application Default Credentials
Python service-to-service auth
Managing Access to Functions
Cloud Functions: Authenticate for Invocation
Cloud Run: Service Identity
Cloud Run: Authenticating Service-to-Service — create the service account; fetch a Google-signed ID token, and add the token to the header.