Deployment of Containerized Data Applications on Google Cloud Run (pt1)

Paul Nwosu
12 min readJan 27, 2023

In this project, a yahoo finance web scraper is built using python, selenium, and beautiful soup. An image of this application is built using docker, this image is pushed to Google Artifact Registry, and then deployed on Google Cloud Run.

This is the 3rd part of a series I started called “Building your First Google Cloud Analytics Project.”

Check out Part 1, Part 2

Link to GitHub Repo

Project Structure

  • Background of the Data Application
  • Basics of Docker
  • Building the Docker Image
  • Running the Docker image locally
  • Introduction to Artifact Registry
  • Introduction to Google Cloud Run Jobs
  • Configuration settings of Cloud Run
  • Setting up a scheduler for the Cloud Run Job

Background of the Data Application

In the last project, we built a web scraper that collected data from Yahoo Finance and loaded it into a Google Sheets spreadsheet connected to a visualization dashboard. The application environment for the scraper and data pipeline were set up and are managed on a Compute Engine instance, which is Google’s primary Infrastructure as a Service (IaaS) solution.

The previous project clearly demonstrated what it means to build using the IaaS compute cloud model. In this project, we will use the same pipeline but deploy it on Google Cloud Run, which uses the Platform as a Service (PaaS) compute model and is serverless. More on that later.

Basics of Docker

Docker is a containerization tool that allows you to create containers on your machine. It helps you package your application and all its dependencies into a single container, which can be built, run, and deployed consistently. Some of the benefits of using Docker include:

  • Containerization: Docker eliminates the need to worry about OS-type dependencies, environment variables, and interference between similar software when developing applications.
  • Virtualization & Isolation: With Docker, you can have multiple versions of the same software installed on multiple containers without worrying about their interaction.
  • Reproducibility: Docker allows you to run applications on various platforms, eliminating the need to worry about OS-type dependencies and environment variables. Using a Docker image, which is like a snapshot of a container, you can easily deploy your application on any supporting computing environment.

To use Docker, you can pull an image from the Docker Hub using the docker pull command. This image can then be used to run a container that can host the application and all its dependencies. For example, you can pull a Postgres:13 Docker image and configure it to run multiple Postgres applications simultaneously without them interfering with each other, as long as they are running on separate ports.

Let’s take a look at some docker commands to help you get started:

In the first article in this series, I already showed you how to install docker on the remote instance that we will be using.

Docker run: The docker run command is used to start a container from an image. Try the ff commands below:

docker run hello-world

hello-world is the name of the image, when you run the command, docker pulls that image from docker hub if it doesn’t already exist on your machine, and uses it to build, then runs the container.

Docker images: This command is used to check all the images that you have available on your host machine. This generally includes images that you pulled from docker hub and does that you built yourself.

docker images

Docker build: The command used to build a Docker image from a Dockerfile is docker build. A Dockerfile is a script that contains instructions for building a Docker image. It is a plain text file that specifies the steps required to create a specific image. Each instruction in the Dockerfile creates a new layer in the image, and these layers are combined to create the final image.

Here is an example of a simple Dockerfile that creates an image for a Python application:

FROM python:3.8

COPY test.py /home/app/test.py

COPY requirements.txt /home/app/requirements.txt

RUN pip install -r requirements

WORKDIR /home/app

ENTRYPOINT ["python", "/home/app/test.py"]

A Dockerfile typically starts with a FROM instruction, which specifies the base image to use for the new image. Other instructions are used to configure the image, such as:

  • RUN to run commands
  • COPY to copy files and directories
  • ENV to set environment variables
  • EXPOSE to specify the ports exposed by the image

The above is an example of a simple Dockerfile that uses the Python 3.8 image as the base, copies the contents of the local directory test.py and requirements.txt to the container’s /home/app directory, runs the command pip install -r requirements.txt to install the required dependencies, sets the /home/app directory as the working directory, and sets an entrypoint to run the python script when the container is launched

It’s important to note that the order of the instructions in the Dockerfile is important as they are executed in the order they are written, and each instruction creates a new layer in the image.

Once you have a Dockerfile, you can use the docker build command to build an image from it. The docker build command creates an image from a Dockerfile and a context, which is the set of files in the directory where the Dockerfile is located and its subdirectories.

Building the Docker Image for the Yahoo Finance Scraper

In the last two projects, can you remember the steps we took to set up an environment to run the scripts of the Yahoo scrape? We are going to replicate those steps and document them in a Dockerfile. Some of the steps we took involved installing Google Chrome and the Chrome driver, importing the Selenium and Beautiful Soup libraries, and setting up the service account key as an environment variable.

Here is the guide to help you build your docker image for the Yahoo Scraper. To get started, create a Dockerfile using the command touch Dockerfile in the directory that contains the files that are required for your application to run successfully.

FROM python:3.8

#Installtion of Google Chrome
# Adding trusting keys to apt for repositories
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
# Adding Google Chrome to the repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
# Updating apt to see and install Google Chrome
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable
# Installing Unzip
RUN apt-get install -yqq unzip

# Download the Chrome Driver
RUN wget -O /tmp/chromedriver.zip\
http://chromedriver.storage.googleapis.com/\
curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE/\
chromedriver_linux64.zip

#Copies the service account key into the docker container
COPY key.json /home/app/key.json

#Sets the service account key.json as an environmental variable called key_file
ENV key_file key.json

#Copies the needed libraries
COPY requirements.txt /home/app/requirements.txt

#Installs the needed libraries
RUN pip install -r /home/app/requirements.txt

#Copies the ff scripts into the docker container
COPY connect.py /home/app/connect.py
COPY main.py /home/app/main.py
COPY scrape.py /home/app/scrape.py

#Sets the working directory
WORKDIR /home/app

ENTRYPOINT ["python", "/home/app/main.py"]
CMD ["--googlesheet", "test_sheet", "--sheetname", "data4"]

Once this is done, it’s time to build the image using the command below:

docker build -t name_of_image .

The name_of_image refers to the name you want to give the image and the . means build the image from the current directory. Once you run this command, docker begins to build the image; this might take some time depending on the processes declared on the Dockerfile. You only have to build an image once for the first time, or anytime you make a change in any of the files required for the application to run.

Running the Docker Image Locally

Now the building process is over, and the image is on your local machine. Run the command docker images to view all the images you have available. It’s possible that you see some other images, this can be because of previous docker run commands that you used. Take for instance, if you had used docker run python:3.8 this downloads the python 3.8 image on your local machine and then uses it to build a container. The next time you run this command, it doesn’t need to download the image, because the image already exists on your local machine, so it just builds the container of the already existing image. To run the image you just built, use the ff:

docker run -it runjob:tag1

The above command uses the image name runjob:tag1 to build a container with all of the application dependencies. In our Dockerfile, we had specified ENTRYPOINT as python , and the absolute directory of the main python script. This basically means, that once our docker container starts running, it automatically runs the python script which in this case is the scraper that you built. Once the script has been run successfully, the docker container instance is killed.

To view the docker containers that are currently running, use docker ps . This command returns row(s) of the container(s) running. The first column shows the container id.

You can view the contents of a running container by using docker exec -it container_id bash , and also view the contents using the image name with the ff commands:

docker run -it --entrypoint=bash runjob:tag1

docker run -it runjob:tag1 bash

Introduction to Artifact Registry

Our docker image works, so it’s time to deploy it on cloud run, but before doing so, we need to push that image to Google Artifact Registry. Google Artifact Registry is a fully-managed, private artifact storage service that allows you to store, manage, and access your software packages. It provides a central location for your team to store and manage images, containers, and other artifacts. It supports common package management formats, such as Maven and npm, and it integrates with Google Cloud Build and other CI/CD systems. With Google Artifact Registry, you can easily manage access to your packages and artifacts, and you can also use it to build and deploy your applications on Google Cloud Platform.

When you build an image using docker, note that you are building it locally, so it is important to push that image to a remote location where Google Cloud Run can access it easily. In this case, that central location is the Artifact Registry.

Below are the steps needed to push an image to Artifact Registry:

Enable the Artifact Registry API: Go to the Google Cloud Console > Navigation Menu > Artifact Registry > Library. Search for artifact registry api and then enable it.

Create the Repo: Go to the Google Cloud Console > Navigation Menu > Artifact Registry. Create the repo as seen below.

Configure Docker to authenticate to a Google Container Registry: When creating our repo, we chose the us-central1 region, so we need to authenticate to that repo using the command below:

gcloud auth configure-docker us-central1-docker.pkg.dev

Build the docker image using the command below:

docker build -t us-central1-docker.pkg.dev/my-project/my-repo/my-image:tag1 .

Set my-project to your current project id. To get/view your project id, use the gcloud config list project .

Set my-repo to the name of the repo you created.

Push the docker image to Artifact Registry:

docker push us-central1-docker.pkg.dev/my-project/my-repo/my-image:tag1

Check the Artifact Registry on the Goolgle Console to confirm that the push was successful.

Introduction to Google Cloud Run Jobs

Google Cloud Run allows you to easily deploy and run containerized applications on the Google Cloud Platform. It offers two main options for deploying your application: Cloud Run Service and Cloud Run Job.

Cloud Run Service is designed for long-running applications that need to be available continuously. With this option, your application is automatically scaled to handle incoming traffic and will continue running until it is manually stopped.

On the other hand, Cloud Run Job is intended for short-lived, scheduled tasks that need to be completed and then stopped. With this option, you can create a Cloud Run Job and trigger it to run at a specific time or on a schedule. This is useful for tasks such as data processing, backups, and other scheduled jobs that need to be performed on a regular basis.

Both options provided by Cloud Run, allows you to easily deploy your containerized applications on the Google Cloud Platform, without the need to manage the underlying infrastructure.

For our task, we will be using Cloud Run Jobs. Below are the steps required to help you get started:

Enable the ff APIs: Compute Engine API, Cloud Run Admin API, Cloud Scheduler.

Deploy the Image on Cloud Run: Once the image is in the artifact registry, deploy the image using the image name as stored in the artifact registry and also the selecting a region where you want the application to run.

gcloud beta run jobs create <name-of-job> --image <image_name:tag> --region <region>

Based one the image, runjob:tag1, that I created and the us-central1 region that I selected, my command looks like this:

gcloud beta run jobs create scrapeyahoo --image us-central1-docker.pkg.dev/my-project/my-repo/my-image:tag1--region us-central1

Configurations Settings on Cloud Run

Once the app has been deployed, you might need to make some configuration settings based on the size of the image which you can see in the directory of the artifact registry.

Go to Google Cloud Console > Navigation Menu > Cloud Run. Select the JOBS tab to view the job you just created.

Click on the name of the job you just created and select edit

Change the memory and cpu configurations as seen below; for higher workloads, you might need to make more adjustments. Save the changes.

Click on execute, if it works, then you shoudld get something like this in front of the Execution ID ✅.

Setting up a scheduler for the Cloud Run Job

The final step in this project, will be to set up a trigger for the cloud run job to run at specific times using Google Cloud Scheduler.

Go to Google Cloud Console > Navigation Menu > Cloud Scheduler. Click on CREATE JOB.

Define the Schedule: You need to set a name for the job, select a region (This should be the same with the region you selected for the cloud run job), and also set a frequency you want to run the job (it’s the same style as a cron job). Lastly, select a timezone based on your country local time.

Configure the Execution: Follow the configuration settings as seen in the image below.

For the URL, use this template:

#Template to follow
https://<REGION>-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/<PROJECT_ID>/jobs/<JOB_NAME>:run

#This is how it looks like for me
https://us-central1-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/gcpprojects-35734/jobs/yahooscrape:run

Configure Optional Settings: Follow the template below

Once this is done, select Create.

You can force a job by clicking the vertical dots and select Force a job run or you can just wait. If the job is successful, you will see a ✅.

You can also pause and delete jobs.

That’s the end of the 2nd project in my GCP series. Here is a summary of everything we did:

  • Set up a Dockerfile to build our application as an image
  • Ran the image on our local machine (or remote instance) to ensure it works
  • Set up the Artifact Registry to store our image
  • Deployed the image on Cloud Run
  • Set up a scheduler to trigger the Cloud Run job

In the next project, we will delve into setting up two types of databases: using a Docker container and using Google Managed Cloud SQL. We will explore the benefits and limitations of each approach and how to choose the right one for your use case. Additionally, we will also explore how to set up a pipeline to automatically backup data and how to scale and monitor your database. Overall, this series provides a comprehensive guide on how to set up and manage your infrastructure on GCP.

In conclusion, this project was a great opportunity to delve into the world of containerization and serverless computing with GCP and gave us a good understanding of how to use Docker and Cloud Run to build, run, and deploy our application in a consistent and efficient way. I hope you enjoyed reading this article and learned something new.

--

--