Building a Serverless Image Text Extractor and Translator Using Google Cloud Pre-Trained AI
Introduction
Those that follow my Medium blogs will know that I write quite a bit about architecture, strategy, and Google Cloud. And occasionally I post about Python.
But rarely do I actually build useful applications in Google Cloud. And so I thought it was about time I did a blog about an end-to-end development experience on Google Cloud.
In this blog I’ll talk about building a serverless AI application which takes a user-uploaded image, extracts any texts it finds, and — if necessary — translates it. I’ll be making use of:
- Cloud Run to host the UI, in the form of a Python Flask application.
- Cloud Functions, to host the backend logic in response to the user uploading an image.
- Google’s pre-built Image and Translation AI Machine Learning APIs.
- Local development using Visual Studio Code, along with
functions-framework
for local Cloud Functions development, and Cloud Code for local Cloud Run development.
A Quick Overview of AI and AI Products
Artificial Intelligence
AI is a broad term that describes making use of machine automation to perform a task that normally requires human intelligence. E.g. speech recognition, visual perception, language translation, decision making.
Machine Learning
A specific subfield of AI, related to teaching machines to recognise patterns in data, and being able to make predictions and solve problems without explicit coding solutions.
Generative AI (Gen AI)
A subclass of AI that is able to generate new data that is similar to — but not the same as — the data it was trained on. Gen AI relies on foundation models, such as large language models (LLMs), generative image models, and multimodal models. A multimodal model is a type of model that is able to process multiple types of input data (e.g. text, image and video), and generate multiple types of content.
My solution won’t be using Gen AI. But I’ve mentioned it here since Gen AI is so prevalent, and I wanted to make sure you understand how these models differ from the predictive models I’m using in my solution.
Google’s Pre-Trained Machine Learning APIs
These are ML models that have been pre-built and trained by Google to perform particular tasks. They are classified as predictive models, rather than generative. Examples include:
- Google Cloud Vision API — For tasks such as: classification, facial recognition, and text detection.
- Google Cloud Natural Language API — For understanding the meaning behind text. This includes identifying important elements of text, and also sentiment analysis.
- Google Cloud Translation API — For translation from one language to antoher.
- Google Cloud Video — For video analysis and annotation.
Motivation for the Application
I’ve been learning Ukrainian for a little while. It’s a beautiful language. I started by listening to the Ukrainian Lessons Podcast, created by Anna Ohoiko. From there, I discovered an active and thriving Ukrainian Learners community on Facebook. Occasionally, community members will post a meme in Ukrainian. Sometimes I’d understand the meme. Often I wouldn’t.
And so I thought… “I’d like an application where I can upload a meme (or any image), extract the text from it, and translate that text into my native language.”
Of course, there are other ways to do this. But I thought it was a great use case for building a new, serverless application in Google Cloud, from scratch.
Application Architecture
It’s a pretty simple application architecture, hosted on Google Cloud serverless services:
The Webapp UI — Cloud Run
Here I run a Flask web application as a container, on Google Cloud Run. The web application does a couple of things:
- It renders the frontend page containing the form used to capture the user input (e.g. language and image upload).
- It handles the request from the user, capturing the image, and sending it to the Cloud Function backend.
I’ve selected Cloud Run because:
- It provides a serverless way to host and run a containerised application. (I.e. our Python Flask web application.)
- It is well-suited for hosting simple stateless web applications, like this one.
- It automatically scales, and scales down to 0 instances, when there is no demand.
The Backend — Cloud Function
The Cloud Function receives the image from the user (via the web application UI) and then calls the respective Google Cloud APIs, in order to extract text from the image, and translate it.
I’ve selected Cloud Functions because:
- It is well-suited to performing short-lived processing, in response to events. Thus, ideal for running our extraction and translation task, in response to the user-uploaded image.
- It automatically scales, and scales down to 0 when there is no demand.
- We can decouple the image processing from the actual user frontend. Thus, if we wanted to use a different frontend, we could easily do so, without changing the code in the Cloud Function.
Text Extraction and Translation
I’m using Google’s pre-built Vision API and Translation API. But why not use a Generative AI model, like Gemini Pro Vision?
- The Vision and Translation APIs are specifically built for the tasks I want to perform.
- Conversely, the Gemini Pro Gen AI multimodal foundation model can achieve the same result, in response to natural language prompts. However, we have no need for natural language interactions here. Why? Because we know exactly what we want the APIs to do in response to an image upload.
- Although Gemino Pro Vision has more versatility as a multimodal foundation model, this power comes with a higher price tag. The Vision API allows 1000 free invocations per month; and the Translate API provides free translation of the first half million characters, per month.
My Dev Environment
WSL
I’m running Windows, with Windows Subsystem for Linux (WSL). For those not familiar with WSL, it is an out-of-the-box environment (included in Windows 10 and later) that allows the running of a full Linux environment, directly inside Windows. I happen to be using Ubuntu.
The advantage of working inside WSL is that I can write any necessary scripts in bash, which means my code will be more portable. For example, I can run the same scripts inside my own environment as I would inside Google Cloud Shell.
Visual Studio Code
VS Code is my code editor of choice. It is free and open source. It runs on Windows, Linux and Mac, and has many useful plugins, such as Git integration, and Google Cloud Code: a set of AI-assisted plugins (including Gemini Code Assist) for facilitating local development with Google Cloud services.
Dev Project Structure
If you want to check out the git repo, you can find it here.
The overall structure looks like this:
└── image-text-translator
├── docs/ - Documentation for the repo
|
├── infra-tf/ - Terraform for installing infra
|
├── scripts/ - For environment setup and helper scripts
| └── setup.sh - Setup helper script
|
├── app/ - The Application
│ ├── ui_cr/ - Browser UI (Cloud Run)
│ │ ├── static/ - Static content for frontend
| | ├── templates/ - HTML templates for frontend
| | ├── app.py - The Flask application
| | ├── requirements.txt - The UI Python requirements
| | ├── Dockerfile - Dockerfile to build the Flask container
| | └── .dockerignore - Files to ignore in Dockerfile
| |
│ └── backend_gcf/ - Backend (Cloud Function)
│ ├── main.py - The backend CF application
│ └── requirements.txt - The backend CF Python requirements
|
├── testing/
│ └── images/
|
├── requirements.txt - Python requirements for project local dev
└── README.md
One-Time Google Project Setup and Permissions
Create the Google Cloud Project
Create a Google Cloud project for your application. This is mine:
Perform the next steps from an account that has sufficient privileges, such as a Project Admin
or Org Admin
.
Enable APIs
Eventually, we’ll Terraform this configuration. But initially, these are the APIs you’ll need to enable:
# Authenticate to Google Cloud
gcloud auth list
# Check we have the correct project selected
export PROJECT_ID=<enter your project ID>
gcloud config set project $PROJECT_ID
# Enable Cloud Build API
gcloud services enable cloudbuild.googleapis.com
# Enable Cloud Storage API
gcloud services enable storage-api.googleapis.com
# Enable Artifact Registry API
gcloud services enable artifactregistry.googleapis.com
# Enable Eventarc API
gcloud services enable eventarc.googleapis.com
# Enable Cloud Run Admin API
gcloud services enable run.googleapis.com
# Enable Cloud Logging API
gcloud services enable logging.googleapis.com
# Enable Cloud Pub/Sub API
gcloud services enable pubsub.googleapis.com
# Enable Cloud Functions API
gcloud services enable cloudfunctions.googleapis.com
# Enable Cloud Translation API
gcloud services enable translate.googleapis.com
# Enable Cloud Vision API
gcloud services enable vision.googleapis.com
# Enable Service Account Credentials API
gcloud services enable iamcredentials.googleapis.com
Service Account and Roles
Service accounts are the standard approach to managing authentication and authorisation for applications, rather than end users. Our Cloud Run application will need to authenticate to our Cloud Function, and our Cloud Function will need to authenticate to the Cloud Cloud Vision and Translation APIs.
So let’s create a service account:
# Make sure your PROJECT_ID variable is set before doing this!
export SVC_ACCOUNT=image-text-translator-sa
export SVC_ACCOUNT_EMAIL=$SVC_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com
# Attaching a user-managed service account is the preferred way to provide credentials to ADC for production code running on Google Cloud.
gcloud iam service-accounts create $SVC_ACCOUNT
Now we’ll bind a number of roles to our service account:
# Grant roles to the service account
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
--role=roles/run.admin
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
--role=roles/run.invoker
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
--role=roles/cloudfunctions.admin
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
--role=roles/cloudfunctions.invoker
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SVC_ACCOUNT_EMAIL" \
--role="roles/cloudtranslate.user"
# Grant the required role to the principal that will attach the service account to other resources.
gcloud iam service-accounts add-iam-policy-binding $SVC_ACCOUNT_EMAIL \
--member="group:gcp-devops@my-org.com" \
--role=roles/iam.serviceAccountUser
# Allow service account impersonation
gcloud iam service-accounts add-iam-policy-binding $SVC_ACCOUNT_EMAIL \
--member="group:gcp-devops@my-org.com" \
--role=roles/iam.serviceAccountTokenCreator
# Ensure your account has access to deploy to Cloud Functions and Cloud Run
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="group:gcp-devops@my-org.com" \
--role roles/run.admin
Local Development Environment Setup
(You will need to follow these steps on any machine where you plan to do development.)
Open a terminal. (I’m opening an Ubuntu Shell from Windows Terminal.) If you haven’t done so already, you’ll want to install Google gcloud CLI and supporting tools into your local environment:
# Install Google Cloud CLI in your local Linux environment.
# See https://cloud.google.com/sdk/docs/install
# Setup Python and pip in Gcloud CLI
# See https://cloud.google.com/python/docs/setup
# Install additional Google Cloud CLI packages for local dev
sudo apt install google-cloud-cli-gke-gcloud-auth-plugin kubectl google-cloud-cli-skaffold google-cloud-cli-minikube
Here (and from now on), we will authenticate with our developer account, rather than an org admin account. Why? I’m following the principle of least privilege.
# Authenticate to Google Cloud
gcloud auth login
When authenticating, click on the first link that is shown. You’ll then be prompted to provide your password.
Next, we’ll setup our application project folder and install some dependencies. If you’re following along and building the application from scratch, these are the next steps:
# This is where I keep my project
cd ~/localdev/gcp/image-text-translator
# Create a Python virtual env. For example...
python3 -m venv .venv
# And now ACTIVATE it
# Add Python packages we need...
python3 -m pip install Flask
python3 -m pip install pillow # For image handling
python3 -m pip install functions-framework
python3 -m pip install google-cloud-storage google-cloud-translate google-cloud-vision google-auth
# And create the requirements.txt file
python3 -m pip freeze > requirements.txt
Alternatively, if you want to clone my git repo:
git clone https://github.com/derailed-dash/image-text-translator.git
cd image-text-translator
# Create a Python virtual env. For example...
python3 -m venv .venv
# And now ACTIVATE it. E.g.
source .venv/bin/activate
# Install the Python dependencies now
python3 -m pip install -r requirements.txt
Git Setup
If you’re building everything from scratch (without cloning my repo) you should now setup your Git and GitHub environment. Don’t forget to first create .gitignore
. Check out my repo to get an idea of what it should look like. Then follow these steps:
# Setup git in Cloud Shell, if you haven't done so before
git config --global user.email "bob@wherever.com"
git config --global user.name "Bob"
git config --global core.autocrlf input # really important if you're using WSL!
# Create local git repo.
# Before proceeding, make sure you have created .gitignore file
# to ignore .terraform dirs and local state, plans, etc.
git init
git add .
git commit -m "Initial commit"
# Let's authenticate the GitHub command line tool
# It is already installed on Cloud Shell
gh auth login
# Now let's use gh cli to create a remote repo in GitHub.
# You can make it private, if you prefer
gh repo create image-text-translator --public --source=.
git push -u origin master
Running VS Code
Let’s open VS Code from our project folder:
# From /path/to/your/image-text-translator
code .
VS Code is clever enough to configure any necessary WSL plugins required.
Setup Application Default Credentials (ADC)
ADC is a strategy that allows authentication libraries to automatically find credentials based on the current environment. This is useful because we can leverage ADC both in our local environment (with the Cloud SDK), but also in our target environment on Google Cloud.
ADC can be configured to use our service account credentials. There are two ways to do this:
- We can impersonate the service account using our own user identity.
- We can create a private key for our service account, and point ADC to the location of this key.
I tried to use impersonation initially:
gcloud auth application-default login --impersonate-service-account $SVC_ACCOUNT_EMAIL
Unfortunately, I struggled to make this work when authenticating my Cloud Function call from Cloud Run. So instead, I’ve created a service account key which we can point the ADC to:
gcloud auth application-default login
# If these are not already set...
export SVC_ACCOUNT=image-text-translator-sa
export SVC_ACCOUNT_EMAIL=$SVC_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com
# Create a service account key for local dev
gcloud iam service-accounts keys create ~/.config/gcloud/$SVC_ACCOUNT.json \
--iam-account=$SVC_ACCOUNT_EMAIL
# Configure the ADC environment variable
# which is automatically detected by client libraries
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/$SVC_ACCOUNT.json
We actually need to set this GOOGLE_APPLICATION_CREDENTIALS
environment variable with each session. Which brings us to…
Setup for Every Session
You’ll need to run these commands with EVERY new terminal session. (Or you can run the command source <project_dir>/scripts/setup.sh
to run these commands.)
export PROJECT_ID=$(gcloud config list --format='value(core.project)')
export REGION=europe-west4
export SVC_ACCOUNT=image-text-translator-sa
export SVC_ACCOUNT_EMAIL=$SVC_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/$SVC_ACCOUNT.json
# Functions
export FUNCTIONS_PORT=8081
export BACKEND_GCF=https://$REGION-$PROJECT_ID.cloudfunctions.net/extract-and-translate
# Flask
export FLASK_SECRET_KEY=some-secret-1234
export FLASK_RUN_PORT=8080
echo "Environment variables configured:"
echo PROJECT_ID="$PROJECT_ID"
echo REGION="$REGION"
echo SVC_ACCOUNT_EMAIL="$SVC_ACCOUNT_EMAIL"
echo BACKEND_GCF="$BACKEND_GCF"
echo FUNCTIONS_PORT="$FUNCTIONS_PORT"
echo FLASK_RUN_PORT="$FLASK_RUN_PORT"
The Cloud Function Backend
Local Development of the Function
I wont reproduce all the code, since you can check it out in GitHub. I’ll just point out a few key things.
In my backend-gcf
folder, I create a requirements.txt
. This is to define the Python packages that must be installed. Cloud Functions will automatically install these packages, when you deploy the function.
Then I create a main.py
. Here is part of the function that acts as the entry point to our Cloud Function: extract_and_translate()
.
@functions_framework.http
def extract_and_translate(request):
"""Extract and translate the text from an image.
The image can be POSTed in the request, or it can be a GCS object reference.
If a POSTed image, enctype should be multipart/form-data and the file should be named 'uploaded'.
If we're passing a GCS object reference, content-type should be 'application/json',
with two attributes:
- bucket: name of GCS bucket in which the file is stored.
- filename: name of the file to be read.
"""
# Check if the request method is POST
if request.method == 'POST':
# Get the uploaded file from the request
uploaded = request.files.get('uploaded') # Assuming the input filename is 'uploaded'
to_lang = request.form.get('to_lang', "en")
print(f"{uploaded=}, {to_lang=}")
if not uploaded:
return flask.jsonify({"error": "No file uploaded."}), 400
if uploaded: # Process the uploaded file
file_contents = uploaded.read() # Read the file contents
image = vision.Image(content=file_contents)
else:
return flask.jsonify({"error": "Unable to read uploaded file."}), 400
It’s pretty self-explanatory.
- We check to see if the function has received a POST. (Later, we’ll create our Cloud Run application to POST the request.)
- If so, we look in the request for an object called
uploaded
. (Our Cloud Run application will attach this to the request.) - If we find this object attached, we read it as binary, and then use this to create
vision.Image
object. - Next we call the
detect_text()
function, passing in the image. This function will use the Vision API to see if there is any text in the image.
# Use the Vision API to extract text from the image
detected = detect_text(image)
if detected:
translated = translate_text(detected, to_lang)
if translated["text"] != "":
return translated["text"]
If so, it will return a Python dictionary containing this text. It will then pass this text into the translate_text()
function, to translate the text into our chosen language.
Here is the detect_text()
function:
def detect_text(image: vision.Image) -> dict | None:
"""Extract the text from the Image object """
text_detection_response = vision_client.text_detection(image=image)
annotations = text_detection_response.text_annotations
if annotations:
text = annotations[0].description
else:
text = ""
print(f"Extracted text from image:\n{text}")
# Returns language identifer in ISO 639-1 format. E.g. en.
# See https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes
detect_language_response = translate_client.detect_language(text)
src_lang = detect_language_response["language"]
print(f"Detected language: {src_lang}.")
message = {
"text": text,
"src_lang": src_lang,
}
return message
This code not only detects any text in the image, but also uses the Google Language API to determine the language of the text.
Next, the translate_text()
function:
def translate_text(message: dict, to_lang: str) -> dict:
"""
Translates the text in the message from the specified source language
to the requested target language, then sends a message requesting another
service save the result.
"""
text = message["text"]
src_lang = message["src_lang"]
translated = { # before translating
"text": text,
"src_lang": src_lang,
"to_lang": to_lang,
}
if src_lang != to_lang and src_lang != "und":
print(f"Translating text into {to_lang}.")
translated_text = translate_client.translate(
text, target_language=to_lang, source_language=src_lang)
translated = {
"text": unescape(translated_text["translatedText"]),
"src_lang": src_lang,
"to_lang": to_lang,
}
else:
print("No translation required.")
return translated
We check that the source language and target language are different, and that the source language is not undefined. If we pass this check, we then then use the Google Translate API to translate the text.
Testing Locally
First, let’s run the function locally. Run this command from our backend-gcf
folder:
# Run the function
functions-framework --target extract_and_translate \
--debug --port $FUNCTIONS_PORT
It should look like this:
I’m going to test with this image:
From a second terminal, let’s POST to the function, using curl
:
# You will first need to authenticate and set the environment vars in this terminal
source ./scripts/setup.sh
# now invoke
curl -X POST localhost:$FUNCTIONS_PORT \
-H "Content-Type: multipart/form-data" \
-F "uploaded=@./testing/images/ua_meme.jpg" \
-F "to_lang=en"
And it works!!
Deploy the Cloud Function (to Google Cloud)
Now we’ve tested it locally, we can deploy it to Google Cloud. Again, we must run from our backend-gcf
folder.
# From the backend-gcf folder
gcloud functions deploy extract-and-translate \
--gen2 --max-instances 1 \
--region $REGION \
--runtime=python312 --source=. \
--trigger-http --entry-point=extract_and_translate \
--no-allow-unauthenticated
# Allow this function to be called by the service account
gcloud functions add-invoker-policy-binding extract-and-translate \
--region=$REGION \
--member="serviceAccount:$SVC_ACCOUNT_EMAIL"
Deployment result:
Here’s a cool thing… With the Cloud Code extension in VS Code, we can now see our deployed Cloud Function in Google Cloud!
Test the Cloud Function
We just need a slightly different curl command:
curl -X POST https://$REGION-$PROJECT_ID.cloudfunctions.net/extract-and-translate \
-H "Authorization: Bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: multipart/form-data" \
-F "uploaded=@./testing/images/ua_meme.jpg" \
-F "to_lang=en"
Let’s test English-to-English with this meme:
Now I’ll try and translate to Ukrainian:
Woop! It works! And now to French:
Hurrah! All the tests appear to be working.
Note: if you try to use the function without passing in authenticated and authorised credentials, you’ll see this:
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/extract-and-translate</code> from this server.</h2>
<h2></h2>
</body></html>
Deploying a New Version
If we update our code and want to redeploy, we can just re-run the same deploy
command.
Deleting the Function
If you want to delete the Cloud Funtion, you can just run this:
gcloud functions delete extract-and-translate --region=$REGION
Flask UI
We will create a simple Flask Python web application using Cloud Run. This application will render the form page, and process the form. Then, after retrieving the form responses (including the uploaded image), it will call our Cloud Function.
Creating the Flask Web Application
This article is not intended to be a Flask tutorial. And so my recommendation is to check out the code in the ui_cr
folder of the GitHub repo.
But here are some key points:
- I have created a
requirements.txt
to define the Python packages required by this application. - I have created a
Dockerfile
which is responsible for packaging our Flask application into a Docker container. (We’ll need this, to deploy to Cloud Run later.) It does so by copying the contents of theui_cr
folder, installing the Python dependencies (as defined inrequirements.txt
), and then defining the entry point for the application. I.e. to runpython app.py
.
Our ui_cr
folder has this structure:
├── ui_cr/ - Browser UI (Cloud Run)
├── static/ - Static content for frontend
├── templates/ - HTML templates for frontend
├── app.py - The Flask application
├── requirements.txt - The UI Python requirements
├── Dockerfile - Dockerfile to build the Flask container
└── .dockerignore - Files to ignore in Dockerfile
The Application Code
I’m not going to go through the code in detail. But I’ll highlight some interesting points and gotchas.
Let’s take a look at app.py
. First, the function that instantiates the Flask application:
def create_app():
""" Create and configure the app """
flask_app = Flask(__name__, instance_relative_config=True)
flask_app.config.from_mapping(
SECRET_KEY='dev', # override with FLASK_SECRET_KEY env var
)
# Load envs starting with FLASK_
# E.g. FLASK_SECRET_KEY, FLASK_PORT
flask_app.config.from_prefixed_env()
client = translate.Client()
flask_app.languages = {lang['language']: lang['name'] for lang in client.get_languages()}
flask_app.backend_func = os.environ.get('BACKEND_GCF', 'undefined')
return flask_app
app = create_app()
- The Flask app requires a secret key in order to manage sessions. We can pass a key to the application using an environment variable,
FLASK_SECRET_KEY
. - I’m using the Google Translate API to retrieve a list of available languages to translate to. I’ll use this to populate a drop-down select, in my form.
- We need to pass the URL of our target function. Again, I’ll use an environment variable for this:
BACKEND_GCF
.
Handling Requests to the Flask Home Page
Here we handle requests to /
. When our user first visits the page, they will be sending a GET request. But when they submit the form with an image to translate, the request will be received as a POST. So we need to handle both.
@app.route('/', methods=['GET', 'POST'])
def entry():
""" Render the upload form """
message = "Upload your image!"
to_lang = os.environ.get('TO_LANG', 'en')
encoded_img = ""
translation = ""
if request.method == 'POST': # Form has been posted
app.logger.debug("Got POST")
file = request.files.get('file')
to_lang = request.form.get('to_lang')
if file is None:
flash('No file part.')
elif file.filename == '':
flash('No file selected for uploading.')
elif not allowed_file(file.filename):
filename = secure_filename(file.filename)
flash(f'{secure_filename(filename)} is not a supported image format. '
f'Supported formats are: {ALLOWED_EXTENSIONS}')
else:
filename = secure_filename(file.filename)
app.logger.debug("Got %s", filename)
app.logger.debug("Translating to %s", to_lang)
# We don't need to save the image. We just want to binary encode it.
try:
img = Image.open(file.stream)
with BytesIO() as buf:
if img_format := img.format: # e.g. JPEG, GIF, PNG
img.save(buf, img_format.lower())
content_type = f"image/{img_format.lower()}"
image_bytes = buf.getvalue()
encoded_img = base64.b64encode(image_bytes).decode()
else:
flash('Unable to determine image format.')
except UnidentifiedImageError:
# This will happen if we resubmit the form
flash('Unable to process image.')
if encoded_img:
message = f"Processed <{secure_filename(filename)}>. Feel free to upload a new image."
func_response = make_authorized_post_request(endpoint=app.backend_func,
image_data=image_bytes, to_lang=to_lang,
filename=filename, content_type=content_type)
app.logger.debug("Function response code: %s", func_response.status_code)
app.logger.debug("Function response text: %s", func_response.text)
translation = func_response.text
return render_template('index.html',
languages=app.languages,
message=message,
to_lang=to_lang,
img_data=encoded_img,
translation=translation), 200
We validate the input, and then — if the image has been uploaded and is a valid image — we make a request to our Cloud Function. This request needs to pass the raw bytes of the uploaded image.
Also, I want to be able to display the uploaded image to the user, in the returned page. I want to avoid saving the uploaded image on disk in the backend, so I’m taking this approach:
- Converting the uploaded image into an in-memory BytesIO object.
- Converting the in-memory object to JPEG.
- Retrieving the raw byte data of the image in the buffer. (This is also the raw data I will send to the
make_authorised_post_request()
function.) - Encoding the JPEG binary image data using Base64 encoding, a string representation that can be safely sent back to the browser.
Making an Authenticated Call to Our Function
When calling the Cloud Function from our Cloud Run Flask application, we need to include the service account access token in the request headers. The Google client libraries will automatically retrieve the credentials from the ADC.
def make_authorized_post_request(endpoint:str,
image_data, to_lang:str,
filename:str, content_type:str):
"""
Make a POST request to the specified HTTP endpoint by authenticating with the ID token
obtained from the google-auth client library using the specified audience value.
Expects the image_data to be a bytes representation of the image.
"""
if endpoint == "undefined":
raise ValueError("Unable to retrieve Function endpoint.")
# Cloud Functions uses your function's URL as the `audience` value
# For Cloud Functions, `endpoint` and `audience` should be equal
# ADC requires valid service account credentials
audience = endpoint
auth_req = GoogleAuthRequest()
# Requests OAuth 2.0 access token for the service identity
# from the instance metadata server or with local ADC. E.g.
# export GOOGLE_APPLICATION_CREDENTIALS=/path/to/svc_account.json
id_token = google.oauth2.id_token.fetch_id_token(auth_req, audience)
headers = {
"Authorization": f"Bearer {id_token}",
# "Content-Type": "multipart/form-data" # Let requests library decide on the content-type
}
files = {
"uploaded": (filename, image_data, content_type),
"to_lang": (None, to_lang)
}
# Send the HTTP POST request to the Cloud Function
response = requests.post(endpoint, headers=headers, files=files, timeout=10)
return response
Getting User Input
Flask renders content back to the browser using Jinja2 templates. These are HTML files that contain embedded code. When we call render_template()
from our app.py
, we pass in a number of variables which are then referenced in the template.
Launching and Debugging
There are a few ways we can launch our Flask app locally:
cd app/ui_cr/
source ../../scripts/setup.sh # Initialise vars if we're in a new terminal
# Run the Flask App
python app.py
# Or with the Flask command.
# This will automatically load any environment vars starting FLASK_
# The --debug tells Flask to automatically reload after any changes
# and to set the app.logger to debug.
python -m flask --app app run --debug
Also, if you want to use the VS Code interactive debugger, I’d recommend creating a launch configuration that looks something like this:
{
"configurations": [
{
"name": "Python Debugger: Flask",
"type": "debugpy",
"request": "launch",
"module": "flask",
"cwd": "${workspaceFolder}/app/ui_cr",
"env": {
"FLASK_APP": "app.py",
"FLASK_DEBUG": "1",
"FLASK_RUN_PORT": "8080"
},
"args": [
"run",
"--debug",
"--no-debugger",
"--no-reload"
],
"jinja": true,
"autoStartBrowser": false
},
// Other configurations
]
}
Testing the Application
Okay, we’re ready to run the application!
Here’s what it looks like in the browser:
Let’s use it to translate our Ukrainian meme:
Hurrah!
Deploying to Google Cloud Run
We’re now ready to deploy our Flask application to Cloud Run. Recall that Cloud Run is a serverless container runtime, so we need our application to be packaged as a container image before we can deploy it.
We’ll run through the following steps:
- Create a Google Artifact Registry (GAR) repo, for storing our Flask app container image.
- Use Cloud Build to build our container image from source and store in GAR.
- Deploy our application from GAR to Cloud Run.
gcloud artifacts repositories create image-text-translator-artifacts \
--repository-format=docker \
--location=$REGION \
--project=$PROJECT_ID
You can check the repo has been created in the Cloud Console:
Now build the Docker image:
export IMAGE_NAME=$REGION-docker.pkg.dev/$PROJECT_ID/image-text-translator-artifacts/image-text-translator-ui
# configure Docker to use the Google Cloud CLI to authenticate requests to Artifact Registry.
gcloud auth configure-docker $REGION-docker.pkg.dev
# Build the image and push it to Artifact Registry
# Run from the ui_cr folder
gcloud builds submit --tag $IMAGE_NAME:v0.1 .
It takes a minute or so. And now, our container image has been pushed to the repo:
Finally, we can deploy to Cloud Run using our image.
# create a random secret key for our Flask application
export RANDOM_SECRET_KEY=$(openssl rand -base64 32)
gcloud run deploy image-text-translator-ui \
--image=$IMAGE_NAME:v0.1 \
--region=$REGION \
--platform=managed \
--allow-unauthenticated \
--max-instances=1 \
--service-account=$SVC_ACCOUNT \
--set-env-vars BACKEND_GCF=$BACKEND_GCF,FLASK_SECRET_KEY=$RANDOM_SECRET_KEY
The output looks like this:
We can verify our service has been deployed in the Google Cloud console:
Redeploying
If we want to deploy a new version of our application, we can do so like this:
# Check our IMAGE_NAME is set
export IMAGE_NAME=$REGION-docker.pkg.dev/$PROJECT_ID/image-text-translator-artifacts/image-text-translator-ui
# Set our new version number
export VERSION=v0.2
# Rebuild the container image and push to the GAR
gcloud builds submit --tag $IMAGE_NAME:$VERSION .
# create a random secret key for our Flask application
export RANDOM_SECRET_KEY=$(openssl rand -base64 32)
# Redeploy
gcloud run deploy image-text-translator-ui \
--image=$IMAGE_NAME:$VERSION \
--region=$REGION \
--platform=managed \
--allow-unauthenticated \
--max-instances=1 \
--service-account=$SVC_ACCOUNT \
--set-env-vars BACKEND_GCF=$BACKEND_GCF,FLASK_SECRET_KEY=$RANDOM_SECRET_KEY
Optionally Setup Custom DNS Mapping
A URL like https://image-text-translator-ui-adisqviovq-ez.a.run.app/ isn’t very memorable! So you might want to map to a custom domain. You can find detailed guidance here.
At this point, I discovered that Cloud Run doesn’t support domain mappings in the region
europe-west2
. So I ended up redeploying my resources witheurope-west4
.
Let’s say you’ve created the subdomain image-text-translator.mydomain.com
and you want to use this address for the application.
# Verify your domain ownership with Google
gcloud domains verify mydomain.com
# Check it
gcloud domains list-user-verified
# Create a mapping to your domain
gcloud beta run domain-mappings create \
--region $REGION \
--service image-text-translator-ui \
--domain image-text-translator.mydomain.com
Now we need to obtain the DNS records for this domain mapping:
# Obtain the DNS records. We want everything under `resourceRecords`.
gcloud beta run domain-mappings describe \
--region $REGION \
--domain image-text-translator.mydomain.com
Take any DNS records that appear under resourceRecords
and create these DNS records in your DNS registrar. For me, there was only one CNAME record to add:
It can take quite a bit of time for the DNS records to propagate, and for Google to provision the managed SSL certificate. For me, it took nearly two hours.
Whilst you’re waiting, you can check progress with these tools:
But finally… It’s all working from my domain!
Pricing and Cost Management
At the time of writing…
Google Cloud Functions
Costs are an aggregate of Function invocations, compute used, and network egress.
- The first million compute seconds in each month are free, as are the first 2 million function invocations.
- The first 400,000 GB-seconds, and the first 200,000 GHz-seconds of compute time are free.
- Also, since Cloud Functions scale to 0 when not in use, this can be very cost effective for sporadic or low-utilisation workloads.
Google Cloud Run
Costs are a combination of CPU and memory used.
- The first 240000 vCPU-seconds are free per month. After which, compute is charged at $0.00001800 per vCPU-second.
- The first 450,000 GiB-seconds are free per month. After which, memory is charged at $0.00000200 / GiB-second
- Since Cloud Run scales to 0 when not in use, this can be very cost effective for sporadic or low-utilisation workloads.
Cloud Vision API
The first 1000 text detections per image for free, in each month. After that, it is charged at $1.50 per 1000 images.
Cloud Translate API
The first 0.5 million characters free in each month, for both detect and translate calls. After that, it is charged at $20 per million characters.
Some Cost Control Strategies
You should consider implementing these cost control strategies:
- Set up budget alerts in your billing account. If budget thresholds are crossed (e.g. 50%, 75%, 90%, 100%), you will get an email notification. Note: this does not cap your spending. It simply alerts you when thresholds are met.
- Limit autoscaling. Whilst Cloud Run and Cloud Functions are both serverless autoscaling services, I don’t anticipate any significant demand for my little application. And consequently, I’ve set both my Cloud Function and Cloud Run service to have
max-instances
of 1. This means that each service will never deploy more than one concurrent instance. - A more sophisticated strategy: when our budget is exceeded, send a notication to Pub/Sub. Use the Pub/Sub event to trigger a Cloud Function which then detaches projects from the billing account, effectively disabling all resources in the project.
Performance Considerations
One of the drawbacks of a serverless service that scales to 0 is that there will generally be no running instances of the service, if demand is low and sporadic. This is not really an issue for Cloud Functions, which are super-lightweight and startup really quickly. But our Cloud Run Flask application is a little larger, and it can take a few seconds to cold start. And so our users will find that the application is slow when they visit the page.
There are a couple of strategies for dealing with this with Cloud Run:
- We can configure the minimum instances to 1. If we do this, our Cloud Run service will never scale to 0. There will always be an instance ready to serve requests. However, this means we’ll be paying for this instance… ALL THE TIME. This is often a great strategy. But for my noddy, low-utilisation application, I don’t really want to do this.
- We can configure Cloud Run startup CPU boost. Here Google Cloud dynamically allocates more CPU to our Cloud Run container during startup, which can dramatically improve our startup time. And, because this extra CPU is only allocated for those infrequent cold starts, it’s generally going to be more cost effective than keeping an instance running all the time.
gcloud beta run services update image-text-translator-ui \
--region=$REGION --cpu-boost
Also, check that your Flask application is not being deployed with debugging enabled. This will definitely hit your startup time!
Some FAQs, General Observations and Tips
Generative AI!
Of course, I need to mention Gen AI again!! Although I haven’t used any Gen AI in the actual solution, I made quite a lot of use of it, to get advice and answers whilst building the solution. In particular, I had quite a few conversations with Gemini Code Assist (which is included as part of the Cloud Code plugin in VS Code), and ChatGPT 4. I estimate that using these tools for problem solving has eliminated maybe 40% of my overall time and effort.
Why Did I Use Both Cloud Run and Cloud Functions?
Why not just include the extract and translate API calls from my Flask application? Then I wouldn’t need Cloud Functions at all.
Two reasons:
- I wanted to decouple the extract-and-translate logic from the UI. That way, if I ever wanted to swap out the UI or add another one (like maybe a mobile app), I could still use my Cloud Function without changing it.
- I wanted to build an application that required integration of two serverless components. That way, I can demonstrate how one calls the other, but also demonstrate other necessary elements, such as using service accounts.
Wrap-Up
We’re all done! Let’s recap what we’ve achieved:
- We created a Google Cloud project to host our application.
- We setup a local development project and environment.
- We’ve defined a service account and assigned roles to it.
- We’ve configured Application Default Credentials, so that our code can find credentials in whatever environment it is deployed to.
- We built a Google Cloud Function (in Python) that receives image data, extracts any text from the image and then translates it to any specified language. We’ve done this by calling Google’s pre-made AI APIs.
- We’ve tested the Cloud Function locally, using the Functions Framework.
- We’ve deployed the Cloud Function to GCP and tested it. It can only be called by authenticated and authorised clients.
- We’ve built a Python web user interface application, using Flask.
- We’ve built a frontend, using an HTML Jinja template, CSS stylesheet and Javascript.
- We’ve configured the Flask Webapp to be able to make authenticated calls to the Function.
- After testing the Flask application locally, we’ve packaged the application as a container image, using Google Cloud Build.
- We’ve pushed the image to Google Artifact Registry.
- We’ve deployed to Cloud Run from the Artifact Registry.
- We’ve exposed our Cloud Run service using our own domain name.
- We’ve reviewed some cost management best practices and controls.
- We’ve implemented
cpu-boost
to speed up the cold start of Cloud Run.
This was fun! I hope you enjoyed it too!
What’s Next?
I’ll be following-up with a second installment. In Part 2, I’ll cover:
- Deploying our Google Cloud infrastructure using Terraform.
- Setting up a CI/CD pipeline, to automatically build and deploy changes.
- Some other application enhancements.
Before You Go
- Please share this with anyone that you think will be interested. It might help them, and it really helps me!
- Please give me claps! You know you clap more than once, right?
- Feel free to leave a comment 💬.
- Follow and subscribe, so you don’t miss my content. Go to my Profile Page, and click on these icons:
Useful Links
The Application
Source Code for the Application
Dev Setup
- Gcloud CLI Setup
- Python Setup for Gcloud CLI
- Cloud Code for VS Code
- VS Code: Sample Python Flask Tutorial
Cloud Functions
- Setup and Invoke Cloud Functions using Python — including local dev
- Functions Framework for Python — for local dev
- Cloud Functions — Image Annotation
- Cloud Functions — HTTP Triggers
- OCR and Translation with CF
Cloud Run
- Create a Cloud Run service from a sample application in VS Code
- Developing a Cloud run service locally in Cloud Code for VS Code
- Debugging Cloud Run in VS Code
- Cloud Run — Hello World
- Cloud Run — Image Processing
- Mapping a custom domain to the Cloud Run service
- Cloud Run Pricing
- Cloud Run Startup CPU Boost
AI/ML APIs
- Vision API — Detect Text in Images
- Google Cloud Vision API Pricing
- Translation API — Translating text
- Google Cloud Translation API Pricing
Authentication
- How Appication Default Credentials Work
- Set up Application Default Credentials
- Python service-to-service auth
- Managing Access to Functions
- Cloud Functions: Authenticate for Invocation
- Cloud Run: Service Identity
- Cloud Run: Authenticating Service-to-Service — create the service account; fetch a Google-signed ID token, and add the token to the header.