GCP Cloud Run sidecars with shared volume (Part 1) — PoC

Adrian Arba
8 min readMar 16, 2024

--

On one of the projects I worked on, I was tasked to do a PoC implementation of an Application lift and shift from on-premise to GCP Cloud Run that requires loading both secrets and a configuration file that references those secrets when it boots up.

Specifically, what was asked of me was to:

  • perform the PoC ASAP — it had to be done “yesterday”, but it surfaced “today” so I needed to do it until “tomorrow” — this means I had to mock the application and the config to ensure the solution conceptually works;
  • deploy the application in Cloud Run as is (end goal), but wrap it in a container — rewriting anything in the application at this time would not be possible due to migration time constraints (and the monolithic aspect of the application);
  • make sure the application loads a config file located in a separate git repo when it starts — the config file must stay within that git repo to ensure changes to it are reviewable, auditable, and taggable to support the application release lifecycle;
  • make sure that the config file, which also contains template references to some secrets gets templated before the application starts — of course, never keep secret data versioned in git, everyone should know this by now;
  • make sure when there is a git update on either the app or the config repositories, it gets reflected in GCP for testing.

That meant that I had to tackle a few challenges:

  • decide if keeping the entire config file in Secrets Manager vs just the secrets and the rest of the config in something like Cloud Storage was the right way to go;
  • mount the config on disk as a volume, which then the application or a script can read as a file;
  • if a script reads the volume, load the file, and transform it before preparing it for the application ingestion;
  • ensure that on either application or config update, this automatically gets reflected in Cloud Run without any application downtime;
  • mock the application due to time constraints and test everything as part of the PoC.

At the time of writing this, March 2024, I knew that GCP had released a few interesting Cloud Run beta features that would help me consider the implementation feasible. Still, I found very little support in the web space in terms of implementations or documentation.

Based on the challenges presented to me, I had to go through several considerations:

Consideration 1 — Should I keep the entire config file in GCP Secret Manager?

As the GCP documentation suggests, Secret Manager is a service used for storing anything that could be treated as a secret or sensitive, so things like API keys, passwords, certificates, and other sensitive data. It’s meant to improve security, by providing means to limit who has access to handle the secrets, you can encrypt secrets, and replicate them and it allows versioning of your secrets and secret rotation (via Pub/Sub and other services).

What it doesn't say anywhere is that you should keep other types of data in there, like config files.

The config file at hand contains 95% nonsensitive data and 5% sensitive data. I don't believe the 5% of sensitive data warrants keeping the rest of the 95% of data in Secret Manager. Not to mention that, in general, if you have high numbers of Secret Manager API calls from applications seeing a lot of user traffic, it tends to start getting expensive, Secret Manager isn’t one of the cheapest Services in Google’s repertoire.

My conclusions:

  • use Secret Manager as it was intended, and add only the secret data that will be loaded in the Cloud Run container as Environment Variables (while securing the access to the Container environment as best as possible);
  • strip the config file from any “on-prem” secret value references and add Shell references to Environment Variables;
  • deploy the config file to another GCP service which would allow me to mount it as a file in the Cloud Run container and therefore be read by the Application. Cloud Run supports volumes from multiple services. You can use Google Cloud Storage, Filestore, Firestore or Cloud SQL if your application is stateful or in-memory storage if your application is stateless. I went with Cloud Storage and in-memory volumes.

Consideration 2 — How do I parse templated “secret” config values in the config file?

As we know, containers run only one main process throughout their lifespan. If the process dies, the container dies as well.

The Application will read the config file on startup, but it needs it templated with the expected values for different keys, it has no logic to do any templating. To me, this sounds like a multi-container implementation.

I’ve used init containers before in this manner, in Kubernetes Stateful Set resources, for example. You can define one or multiple containers that run before your main Application container and prepare the field so that the Application will then have all the details it needs to do its job.

Google’s documentation doesn't mention anything about init containers (and to my knowledge, they are not supported yet), but it does mention sidecar containers — parallel execution of multiple containers. This implementation is great for long-running secondary containers which potentially do things like data manipulation before it reaches the main container, act as firewalls or as telemetry metric gathering, etc. But they run as long as the main process container runs too.

I only need the secondary container to run first and transform a file before the main Application container starts up and uses that file.. Grrr!!

Not even ChatGPT thought what I wanted to do was possible, as its latest learning span is April 2023 — on this note, I consider ChatGPT to be a very helpful tool in the DevOps estate, as it can quickly give you links to different documentation when you start assessing the feasibility of a solution and can help you explore playing with different components before you even touch a single line of code.

My conclusion: as this was the only possible option, I chose to implement a sidecar container that will run a Python script, load the templated config file as a volume, parse it, and replace any template values with the values in ENVIRONMENT variables (loaded from Secret Manager), and then copy the config file to another mounted volume to be shared with the main Application container.

That led me to...

Consideration 3 — Can volumes be shared between sidecar containers in GCP Cloud Run?

Google’s documentation on sidecars and volume sharing is skittish, I only found one single reference here that briefly mentions that “You can use in-memory volumes to […] share an in-memory volume between different containers in one Cloud Run instance”.

The only other option would have been a Filestore instance for NFS file storage, which meant provisioning and maintaining another set of Terraform resources for just a few kilobytes of data... No way!

So I set out to test the hypothesis that memory volumes can be shared between sidecar containers in Cloud Run and hey, what do you know, it worked.

A few things to consider here, as per the GCP documentation:

  • when creating an in-memory volume, we recommend specifying a size limit. If the volume reaches its size limit, further writes will fail with an out-of-memory error. Your instance can handle this error and keep running.
  • if you deploy multiple containers, the memory used by each write to the volume counts as memory usage for the container that wrote the data.

So, really important, set a memory limit that is both adequate and also thought of when setting container resources. For my purpose, I set a limit of 50Mb for the volume.

Consideration 4 — How do I ensure that on either an Application or a Config change, the Cloud Run end user receives the updated result?

The team discussed how the Application should be updated on either an Application code update or a config update.

One clear thing about Cloud Run containers, just like with any other container service, is that containers are immutable. Updates should be treated as a new version of the end container.

And since the config file is being read only when the Application starts up, having it updated dynamically would not produce any effect on the Application.

Three options were discussed:

  • Have a process poll the config file (maybe via a SHA hash) and, if it sees a change, signal the App process to die, thus forcing Cloud Run to spin up another instance that would reload the config file and serve the update to a new Application instance — not ideal, as this would affect all running instances and would lead to Application unavailability during container boot up (a few seconds);
  • Rewrite the Application to continuously poll the config file — not useful as the config file also contains feature flags and secrets:
    - if a new feature is added but the Application is not redeployed, the flag would be useless;
    - if a “secret” value is added and the Application container has not been restarted to read the new Secret Manager secret, the config update would be useless.
    There was also not enough time to do this;
  • Redeploy to Cloud Run as a new Revision regardless of the update either being an Application update or a Config update. This option is the one I favored the most, as it ensures the Container immutability principle is respected + bonus, Cloud Run only switches to a new Revision if it passes all the Startup probes and can serve traffic, otherwise, it keeps the old Revision running. This means no downtime in case of any issues with the update.

The conclusion was to have our (Tekton) CI/CD deploy pipeline listen to triggers from both the Application and the Config container and, regardless of who has an update, redeploy to Cloud Run as a new Revision.

Consideration 5 — What service should I use to load the config file into and run my template script?

As I mentioned in Consideration 1, there are multiple services you could use to provide files as volumes to Cloud Run.

My conclusion was to use Google Cloud Storage to upload the raw config files into (and version them) from the git repository and then mount them in my “init” sidecar container to be rendered by the templating script with actual values. It’s cheap and does the job, I only read from it, I don’t write to it and I only need the file in one step.

In Part 2, I’m going to go through the code I used to do the PoC. It’s going to include:

  • writing 2 Python scripts to render the templates in the config file and mock a Flask application for me to see the actual config file being rendered when I access the Cloud Run instance URL;
  • building 2 Docker container images (one for the template rendering script and one for the mocked application);
  • pushing the container images to Google Artifact Registry;
  • writing the Terraform code for the Cloud Run instance, including the latest Terraform Cloud Run provider google_cloud_run_v2_service, the BETA launch stage, bucket provisioning, and permissions;
  • mounting the Cloud Storage saved config file in the Application container and sharing an in-memory Volume between the “init” rendering container and the Application container.

--

--