Go Serverless with Google Cloud Run Functions
Serverless development is my favorite way to build modern applications — with Google Cloud Run Functions, you bring the code and Google handles all of the heavy lifting of load balancing, scaling, availability, and Infrastructure/OS management. This means developers just focus on building awesome apps. Here is the first of a 3 part guide on Cloud Run Functions.
Cloud Run Functions is the only serverless FaaS product in the market that supports scale up and down from zero, can attach L4 GPUs on demand, and deploy to multi-regions with one command
The Benefits
- You only pay for the resources you consume. You don’t need to worry about idle VM instances or optimizing your cluster sizes. You’re billed on a combination of
Invocation
count (first 2M / month are free) + resources allocated during theduration
of execution — which we refer to asVcpu-seconds
andGib-seconds.
- There is no maintenance! Developers don’t need to debate Instance types, or configure VMs, and there is no Operating System to patch
- You can operate nearly any workload, including deploying LLMs or running graphics intensive workloads using the new L4 GPU support
The Basics
Functions respond to Triggers —there are two types of triggers
- Event based triggers are an event generated by a GCP Service. Examples are EventArc, Pub/Sub, Cloud Storage, or Firestore. These services generate an event that results in an asynchronous invoke of your Cloud Run Function.
- The second type of event is a synchronous HTTP trigger. Functions are auto assigned a Function URL that is referred to as the HTTP trigger. You make a HTTP POST request to the url to invoke the function. By default, the URL is public and requires Authentication. In addition, you can restrict or disable external traffic. It’s always in this format:
https://REGION-PROJECT_ID.cloudfunctions.net/FUNCTION_NAME
Pro Tip: You can invoke your function for testing with a
curl
command:
curl -m 70 -X POST https://[region]-[project_id].cloudfunctions.net/[func_name] \
-H "Authorization: bearer $(gcloud auth print-identity-token)" \
-H "Content-Type: application/json" \
-d '{
"name": "Hello World"
}'
Security
Functions must assign a Service Account (SA). The attached SA provides an identity to all code running in the Function. By default, Functions will use the default compute service account, which has broad permissions. It looks like this:
PROJECT_NUMBER-compute@developer.gserviceaccount.com
You should always create your own SA and attach appropriate, scoped down permissions to specific resources your Function needs to access.
Function Resources
You configure the vCPU and Memory resources allocated to the function. You should choose settings that match your workload needs, since Google bills you based on resources allocated
, not consumed.
Let’s write a HTTP Function
Here’s the basic required implementation for a HTTP function (NodeJs). Other language examples are available here. At minimum you need to register with the functions framework using the functions.http()
method and then return a valid HTTP response. The req
parameter will contain all information delivered as part of the HTTP request.
const functions = require('@google-cloud/functions-framework');
// Register an HTTP function with the Functions Framework that will be executed
// when you make an HTTP request to the deployed function's endpoint.
functions.http('helloGET', async (req, res) => {
// Retrieve information from the request
const name = req.query.name
// Return a valid HTTP response
res.send('Hello World!');
// Optionally, specify the status code and response type
// res.status(200).json(retVal);
});
Pro Tip: Change the function syntax to
async
so you can use top level await
Let’s write an Event Driven Function
You still have to register with the functions framework but this time use the functions.cloudEvent()
method. Notice you do not return any values, since event triggers are an async process and do not expect a response.
The cloudEvent
parameter will contain the entire payload for the event. Access the cloudEvent.data
field to extract details specific to the event that occurred. The payload format for the data
field will differ depending on the service that generated the event. For example, some services base64 encode this field but most services just write a JSON object. See the Google-Events repo to see all possible events and their format.
const functions = require('@google-cloud/functions-framework');
// Register a CloudEvent callback with the Functions Framework
functions.cloudEvent('helloEvent', cloudEvent => {
// Handle the Event here. The event is inside the cloudEvent.data field
const eventPayload = cloudEvent.data;
// If Cloud Storage, the payload is in JSON format
// const {cloudStorageEvent} = {eventPayload.name, eventPayload.bucket}
// If Pub/Sub, the data payload is base64 encoded
// const base64Event = eventPayload.message.data;
// const pubsubEvent = Buffer.from(base64name, 'base64').toString()
console.log(`Hello!`);
});
What is Concurrency?
There are two concepts you should be aware of when you think about function scaling. They are configurable and operate independently of each other.
Concurrent instances per function
Cloud Run Functions automatically scales the number of active containers behind the scenes to handle incoming work. This is the number of concurrent instances of your function that are active at any given time.
Functions transition through 4 main lifecycle stages: Starting → Active → Idle → Shutting down
- When a Trigger invokes a function, Google will attempt to use an instance that is warm & available to serve the request. You can check if any are available using the
Instance count
metric and look for theidle
count. - If no
idle
instances exist, Google will spin up an instance to serve the invoke — this is called a cold start because there is overhead to go from zero toactive
. The instance will remain active until all processing is complete. - After a function completes processing, it will transition to
idle
and wait for another invoke. Functions remain in idle state for up to 15 minutes then the instance is removed. At this point the instance transitions toShutdown
. - Here is an example of the
Instance Count
metric. At the highlighted time, there are 35 active instances processing workloads and 5 idle.
You can mitigate colds start by setting the Autoscaling configuration: minimum
instances. You may also want to set maximum
instances to control costs or protect downstream services from being overwhelmed.
Max Concurrent requests per function instance
Functions default to serving a single request at a time. However, you can increase this setting (up to 1000) to allow a single instance to serve multiple concurrent requests. This is referred to as concurrent requests per function instance.
- Serving multiple requests concurrently generally provides significant cost savings since the majority of cost comes from instance uptime.
- This feature results in shared global memory and cpu consumption. You need to ensure your code is able to safely operate this way.
- You can view the
Max Concurrent Requests
metric to understand how your functions are serving requests.
Pro tip: By default this metric shows a Distribution over 1 minute, which isn’t very useful. You can modify it to group by count which is more useful.
Summary & TLDR;
When you transition from VM based scaling to Serverless scaling — Concurrency becomes the main scaling factor. Cloud Run has two layers of Concurrency that are critical to understand.
There also are other important Cloud Run Functions features I’ll cover in a future post but you can read through official docs now:
Let me know if I can help you in your Serverless journey on Google Cloud!