Authenticate Google Cloud In Docker Without Getting Your Creds Stolen

Avoid GCP 401 errors — and security concerns — by passing project credentials into your Docker image the right way.

Zach Quinn
Pipeline: Your Data Engineering Resource
5 min readNov 11, 2024

--

Currently job searching? Give yourself an edge by developing a personal project using my free 5-page project ideation guide.

Welcome to another edition of I couldn’t find this information using StackOverflow/Reddit/documentation so I need to brain dump a post about what I learned. With any luck, generations of engineers will benefit from this sacred knowledge.

In a previous story I wrote (ranted) about what influenced my dual-dev IDE strategy which involves a borderline-impractical combination of Jupyter Notebook (run through Vertex AI) and VS Code. I explained that Jupyter is typically my sandbox where I craft “rough drafts” of code and VS Code is both the testing ground and vehicle for committing the final product to GitHub.

In the past I’ve also shared that, in order to reduce the embarrassment (and credibility hit) associated with a failed deployment, I prefer to create clean virtual environments, usually using Pyenv, to facilitate a testing ground for my Python-based pipelines.

Recently, however, I’ve eschewed Pyenv in favor of a more data engineer (and software engineer)-oriented strategy: Testing code using Docker images.

Photo by Rubaitul Azad on Unsplash

Why Docker?

I prefer using Docker to test because it offers:

  • Portability — I can send a Docker build file and my directory to a teammate to run locally
  • Cleanliness of environment — Even with a clean Pyenv virtual env I sometimes end up with remnants of packages that shouldn’t be installed, especially if I forget to uninstall before creating the environment
  • Production-adjacent development — Many of the more processing-intensive pipelines I develop end up using Docker images to fuel a cloud computing resource like a VM or Kubernetes pod

Both Docker’s documentation (humorously called ‘docker docs’) and the legion of beginner-friendly blog posts offer a solid overview of both Docker and containerization, so I won’t cover that in this post.

Specifically, I’d like to address the use case of testing scripts that require authentication with Google Cloud Platform (GCP)’s APIs within Docker images.

Note: I need to establish context and background before delving into the solution. If you’re here to learn syntax, scroll down to the “Implementation” subheading.

What’s A Docker Container And Why Do I Need It?

To understand why this is a problem, I’ll provide the briefest of overviews.

If you’re unfamiliar with containerization, this is a method, an abstraction really, that allows users to create mini applications that run using one or more “images” that run within a container.

These images are built from a single script called a Dockerfile.

I’m almost any case when you write a Dockerfile, you’ll need to begin by copying what is on your local memory onto the container.

This is done with the “COPY” command, as seen in this example.

FROM python:3.12-slim

WORKDIR /app

COPY . /app

Which means if you have a simple directory with, say, a Python main file and a config file, you’re able to use its contents with Docker, provided you’re using a compatible base Python image.

It’s important to note that a Docker container isn’t preconfigured with anything, meaning that when you create an image, you’re building upon a clean slate.

This is great when you want to avoid common programming errors stemming from dependency conflicts.

It’s not so great when you want to set and persist an environment variable at build time.

If you’re a GCP user you’ll know that, in order to run locally, you need to authenticate with your project credentials.

Typically, these are assigned to an environment variable called GOOGLE_APPLICATION_CREDENTIALS.

When a package using the GCP API looks for authentication, the contents of GOOGLE_APPLICATION_CREDENTIALS is what it will check, so it’s critical to configure this variable for local testing or you’ll get the dreaded 401 unauthorized error.

If you’re expecting me to say you can’t set an environment variable and include your GCP creds within a Docker image, you’d be wrong, because you absolutely can.

But you absolutely shouldn’t.

From an information security perspective, this is a no-go. Including your creds within an image means that anyone who has access to your image is just a few bash commands away from having access to your project creds.

And this is why, to correctly and securely authenticate GCP with Docker, you should use a Docker concept called a volume.

Implementation

Storing your creds within a volume is safer and more practical because your creds are only accessible at runtime and do not perpetually live within your Docker image.

With volumes, Docker provides an elegant solution to unwanted cred exposure, allowing you to keep your creds on a separate OS.

The syntax for this is similar to the initial COPY command.

We’re simply running our Docker image but with the added step of “mounting” our credentials.

The general structure is this.

docker run -v /directory_on_your_device

Specify your target directory within the container.

docker run -v /directory_on_your_device:/target_directory_in_your_container

Add an environment flag (“e”)

docker run -v /directory_on_your_device:/target_directory_in_your_container -e

Add credential variable with local storage path.

docker run -v /directory_on_your_device:/target_directory_in_your_container -e CRED_VAR=path_mounted_to_container

And add the image version, just like you’d do with a conventional docker run command.

docker run -v /directory_on_your_device:/target_directory_in_your_container -e CRED_VAR=path_mounted_to_container image:latest

Finally, an example.

docker run -v /zach_quinn/desktop/projects/docker_ex:/app -e GOOGLE_APPLICATION_CREDENTIALS=/app/gcp_creds.json ex_image:latest

What Have We Learned?

Keep in mind:

  • Your creds need to be stored in the same location as the directory you mount to the container
  • If you don’t want to store your creds in a directory in prod, download/delete them on each run
  • This method only works at runtime, not during the build stage, hence “docker run” in the examples

In a technically dense discipline like data engineering, sometimes you read so much and absorb so much knowledge that you overlook certain processes.

For me, it was this method of authentication, which, incidentally, served as my introduction to Docker volumes.

The most compelling reason to use a volume, especially in the context of authentication, is to make sure your credentials stay protected.

Because the one Docker method you never want to see is this…


docker steal

I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.

--

--

Pipeline: Your Data Engineering Resource
Pipeline: Your Data Engineering Resource

Published in Pipeline: Your Data Engineering Resource

Your one-stop-shop to learn data engineering fundamentals, absorb career advice and get inspired by creative data-driven projects — all with the goal of helping you gain the proficiency and confidence to land your first job.

Zach Quinn
Zach Quinn

Written by Zach Quinn

Journalist—>Sr. Data Engineer; new stories weekly.