Optimize your Cloud Run functions

George Mao
Google Cloud - Community
5 min readNov 18, 2024

--

In my Intro to Cloud Run functions post I covered the basics that all Cloud Run Functions developers should know. In this post, I’ll show you how to understand costs & strategies you can use to optimize your functions.

Pricing Deep Dive

The primary driver of costs is consumption of resources configured. Here is an easy way to remember the pricing formula:

Resources configured at deploy time (vCPU + Memory) * Duration of execution

All components of the formula have a direct, linear effect on cost. So if you double the configuration, you double the cost. Or if the function runs twice as long, you double the cost. If you cut the configuration in half or the function runs half as long, you save 50%. Here is a simple example:

  • You configure your function to use 2 vCPU / 4GB Memory and it runs for 1000ms on average
  • If this function is invoked 1M times a month, it’ll cost you $58/month
  • If you cut this configuration in half (1 vCPU and 2GB memory) and the function continues to run for 1000 ms, you’ll reduce costs by 50%, or spend $29/month
  • If the the reduced resources causes the function to run slower and take 2000ms on average, you’re back to $59/month

In general, your goal is to find the configuration “sweet spot” where performance and cost are optimal.

Optimizations

There are two categories of optimizations you can make. The first is configuration changes to your function — this is easier but requires a deep dive into metrics. The second is coding changes. Let’s start with configuration changes.

Allocate resources to match workload needs

The best way to begin is to determine the primary resource your workload consumes, is it CPU or Memory? Focus on tuning that resource properly. Tuning is generally a 3 step process:

  1. Start with a baseline configuration
  2. Run tests to benchmark resource usage & review results via metrics
  3. Make changes and repeat

Here’s a snippet of code that represents a CPU intensive workload with very little memory requirements — it computes SHA512 hashes in a sequential loop. Our goal is to determine where the CPU configuration gives us the best cost/performance value. Full source code available on Github.

N = 51200

@functions_framework.http
def hello_http(request):
#HTTP Cloud Function that simulates CPU load

m = hashlib.sha512()
# Generate a random long string
rstr = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))

t1 = datetime.now()
print("Start time: " + str(t1))

# repeatedly compute the digest
for i in range(10000):
m.update(rstr.encode())
m.digest()

t2 = datetime.now()
print("End time: " + str(t2))
print("Difference: " + str(t2 - t1))

difference = str(t2 - t1)

return 'Performance {}!'.format(difference)t
  • I started with the default 256MB memory configuration and set vCPU = 0.167. This resulted in ~6.8 seconds to complete the execution.
  • Next, I double vCPU to 0.333. This doubles performance, reducing duration to ~3.3 seconds.
  • Next, I triple vCPU to 1. This triples performance, further reducing duration to ~1 second.
  • Finally, I double vCPU to 2 and another double to 4 vCPU. Neither of these provide any meaningful performance gains.

Just to verify, I check the Memory Utilization metric and confirm that I am well under the 256MB configuration.

If this were a real workload, I would probably set my vCPU = 1. Anything more is simply a waste of money.

Use Concurrent Requests Per Instance

This is the next configuration based optimization that you should look into. By default, the Cloud Run functions service will only allow a single request to be served concurrently by each instance of your function. With my vCPU set to 1, I investigate the Container CPU utilization metric. I see a peak of ~12.97% CPU. This leads me to believe my container’s CPU is underutilized.

Based on my hypothesis, I increase the Max Concurrent Requests per Instance from 1 → 2 and execute two simultaneous invokes of my function.

As I expected, this results in about double the CPU utilization, or ~25.98%. This makes sense as the same container is doing double the work now.

This configuration allows a single container to service twice as many requests during the same billable window. I can check the Billable Container Instance Time metric to confirm.

Note: Always check the Duration or Latency metric to confirm CPU utilization does not have a negative effect and degrade performance

Use global state correctly

The final optimization technique involves moving heavy compute tasks to take place in global execution state. This allows the CPU Boost feature to provide extra cpu power (about 2x) during cold starts. Cloud Run Functions can run cpu intensive tasks for up to 10 seconds with boosted cpu.

All code in global state is only executed once for the life of the function’s instance and can be reused durning future warm invokes.

Below, I’ve refactored the SHA computation code to take place globally instead of inside hello_http function. It will run once during a cold start and benefit from boosted CPU. See updated code on Github.

N = 51200

# This function is called during cold start only
def computeHashes():
m = hashlib.sha512()
# Generate a random long string
rstr = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))

t1 = datetime.now()
print("Start time: " + str(t1))

# repeatedly compute the digest
for i in range(10000):
m.update(rstr.encode())
m.digest()

t2 = datetime.now()
print("End time: " + str(t2))
print("Difference: " + str(t2 - t1))

difference = str(t2 - t1)
return difference

# Save results in global state
perf = computeHashes()

@functions_framework.http
def hello_http(request):
#HTTP Cloud Function that simulates CPU load

return 'Performance {}!'.format(perf)

Time to test! I execute the function 3 times in a row. The first invoke runs the hash compute code in ~1 second, while the following two invokes don’t execute the hash code at all and simply return in 3 ms.

1 cold invoke followed by two warm invokes

Summary

In my experience, tuning Serverless services always requires actual testing and analysis of results. It’s a bit of trial and error and Cloud Run functions is no different. In my next post, I’ll cover a great way to perform large scale load testing of Cloud Run functions.

In the mean time, be sure to review the Cloud Run functions Tips & Tricks.

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

George Mao
George Mao

Written by George Mao

Head of Specialist Architects @ Google Cloud. I lead a team of experts responsible for helping customers solve their toughest challenges and adopt GCP at scale

No responses yet