3 Ways to Optimize Cloud Run Response Times
Season of Scale
Season of Scale
“Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises.
In Season 2, we’re covering how to optimize your applications to improve instance startup time! If you haven’t seen Season 1, check it out here.
- How to improve Compute Engine startup times
- How to improve App Engine startup times
- How to improve Cloud Run startup times (this article)
Critter Junction has created a pretty diverse Compute infrastructure between Compute Engine, App Engine, and Cloud Run. We learned in Season 1 that they decided to go with Cloud Run for their Layout App. To refresh, the Layout App is a key part of the game. You can share house layouts with other players. Now they’re looking to optimize Cloud Run for scalability.
Check out the video
Review
Since containerizing their Node.js application, they’ve decided to run it on Cloud Run for its portability, statelessness, and autoscaling, even to zero.
Unlike their online site running on App Engine, they haven’t needed to write warmup wrappers in their code, because Cloud Run may keep some idle instances around to handle spikes in traffic.
Cold starts on Cloud Run
The thing is, Cloud Run will terminate unused Cloud Run containers after some time…which means a cold start can still occur.
After looking at recent deployments of the Layout App, we noticed a few things that could be improved to minimize cold start latency.
- First they happened to be using a dynamic language with dependent libraries, like importing modules in Node.js.
- They weren’t using global variables.
- And their container base images were about 700 megabytes in size.
This meant overall they were facing longer load times upon container startup or required additional computation before the server could start listening for requests. Instead, they want to optimize their service startup speed to minimize the latency that causes these.
Let’s dive into each of these.
#1 Create a leaner service
For starters, on Cloud Run, the size of your container image does not affect cold start or request processing time.
Large container images, however, mean slower build times, and slower deployment times.
You want to be extra careful when it comes to applications written in dynamic languages. For example, if you’re using Node.js or Python, module loading that happens on process startup will add latency during a cold start.
Also be aware of some modules that run initialization code upon importing.
To build a leaner service you can:
- Minimize the number and size of dependencies if you’re using a dynamic language.
- Instead of computing things upon startup, compute them lazily.
- Shorten your initializations and speed up time to start your HTTP server.
- And use code-loading optimizations like PHP’s composer autoloader optimization.
#2 Use global variables
In Cloud Run, you can’t assume that service state is preserved between requests. But, Cloud Run does reuse individual container instances to serve ongoing traffic.
That means you can declare a global variable. When new containers are spun up, it can reuse its value.
You can also cache objects in memory. Moving this from the request logic to global scope means better performance.
Now this doesn’t exactly help cold start times, but once the container is initialized, cached objects can help reduce latency during subsequent ongoing requests.
For example, if you move per-request logic to global scope, it should make a cold starts last approximately the same amount of time (and if you add extra logic for caching that you wouldn’t have in a warm request, it would increase the cold start time), but any subsequent request served by that warm instance will then improve latency.
// Global (instance-wide) scope// This computation runs at instance cold-startconst instanceVar = heavyComputation();/*** HTTP function that declares a variable.** @param {Object} req request context.* @param {Object} res response context.*/exports.scopeDemo = (req, res) => {// Per-function scope// This computation runs every time this function is calledconst functionVar = lightComputation();res.send(`Per instance: ${instanceVar}, per function: ${functionVar}`);};
A lot of this boils down to creating a leaner service.
#3 Use a smaller base image
You want to build a minimal container by working off a lean base image like: alpine, distroless, or scratch.
These images reduced Critter Junction’s image size from 700 mb to 65 mb! They also made sure to only install what was strictly needed inside the image.
In other words, don’t install extra packages that you don’t need.
Once Critter Junction was able to remove dependencies, use global variables, swap to a leaner base image and remove extra packages, they were able to reduce the latency of any Cloud Run cold starts.
That’s a wrap for Season 3 of Season of Scale! There are a bunch of other best practices for Cloud Run scalability and performance, so be sure to check out the links below.
And remember — always be architecting.
Next steps and references:
- Follow this blog series on Google Cloud Platform Medium.
- Reference: Cloud Run General Development Tips.
- Follow Season of Scale video series and subscribe to the Google cloud platform YouTube channel.
- Want more stories? Give me a shout on Medium, and Twitter.
- Enjoy the ride with us through this miniseries and learn more about scalable GCP best practices.