Cloud Functions Best Practices (4/4): Monitor and log the executions

Published in

Google Cloud - Community

8 min readFeb 20, 2023

Follow carefully what is happening

This article is part of a 4 articles serie in which I give various advices about Google Cloud Functions development. This work is a result of two years of daily practice, deployment and monitoring. Some of those best practice are directly from the official documentation, others are from my experience, what was proven to be the most effective. For any different point of view, feel free to comment this (free) article. Thanks!

<<< Cloud Functions Best Practices (1/4) : Get the environment ready

<<< Cloud Functions Best Practices (2/4) : Optimize the Cloud Functions

<<< Cloud Functions Best Practices (3/4) : Secure the Cloud Functions

Monitor and log the executions

Understand what’s going on — Photo by Markus Spiske on Unsplash

We now have optimised and secured Google Cloud Functions running well.

We can deploy them and go for a long deep and resourceful sleep…

But when the morning comes, we would love to know how it went!

Before everything, we want to know if there was some crashes or important events that went on.

Our daily routine would look like that:

Click the Cloud Functions
Go to the Log tab
Scroll through allll the printed logs and check for errors
Do that for every Cloud Functions
Every
Morning

This process does work (well), but is not optimised! Let’s see how to monitor and log our Google Cloud Functions!

Log. But log smart.

Every company have their own logging standards, but using Cloud Functions, the standards and best practices might need to change a bit.

In Python, Nodejs, Java… there are level of logs.

It means that you can print an info log, an error log, a warning log depending the event occurring during the execution.

And that’s some good news! Google Cloud Functions is using these levels to filter logs: directly inside CF logs, in Stackdriver and in Monitoring (we will see that later).

See this example:

In the example above, we see different kinds of logs depending what we chose to print.

You can easily filter logs depending the criticality: is it a failure? is it a rare case? Is in an unexpected case? is it something you should be notified for?

To use the correct level, follow these rules:

debug: For development, execution logs for debugging

info: When something interesting, but expected, happens (function starts, it goes in a specific part of the function, API call termination…). From that level, it might not be interesting to get notified

warning: This is not an error, but the function went to an unexpected or unusual path

error: Things went wrong, we should definitely flag this situation

critical: Wake up, the function was unusable!

This hierarchy has many advantages in terms of tracking, monitoring and notifications, we will see that in the next parts, for now, let’s see how to setup this monitoring in python:

First, create the basic structure for Cloud Functions, if you don’t know about the recommended structure for Cloud Functions, have a look here.

Inside your project folder, create a file requirements.txt.

Inside requirements.txt , paste this line:

google-cloud-logging==3.5.0

And add these lines into main.py:

from flask import Response
import google.cloud.logging
import logging

client = google.cloud.logging.Client(project="project")
client.setup_logging()
# logging.basicConfig(format='%(asctime)s %(message)s')

def main(request):
    try:
        logging.debug("I am a debug log!")
        logging.info("I am a info log!")
        logging.warning("I am a warning log!")
        logging.error("I am a error log!")
        logging.critical("I am a critical log!")
        return Response(response = 'ok', status = 200)
                
    except Exception as e:
        
        logging.error("I am a error log!", e)
        return Response(response = 'AN ERROR OCCURED, see logs', status = 400)

Once we have that, we can get a cup of tea, well done.

We can test it using Function Framework (see best practices). It seems like, if we are in local, we need to comment line 5 & 6, we will be using the Python’s built-in logging library.

For deployment, uncomment these lines so it can work with Google Cloud.

Deploy the function:

gcloud functions deploy test_logging --region=europe-west2 --trigger-http --entry-point main --runtime python310 --max-instances 3 --allow-unauthenticated

⚠️ Don’t forget to delete this function after the project, as we set --allow-unauthenticated, see security best practices for Cloud Functions.

Call the prompted URL, and go to the log tabs of the Cloud Function using UI.

You can now filter by severity!

The best thing about this trick is not so visible here, but imagine having thousands of info lines, and you just want to see the warning/error/critical logs, you can simply select “Warning” in Severity. You will see all logs having a level higher or equal to Warning.

This is life changing to know in a glimpse if everything went right for the last few days. 💪

To know a bit more about this feature, you can check the article from Daniel Sanche, introducing the new library version.

Unleash the power of Stackdriver

Stackdriver is cool, when correctly used.

Stackdriver has many filters helping a lot when looking for specific executions or specific parts.

Let’s have a look at the UI. Go here: https://console.cloud.google.com/logs/query

(This place is wonderful, it tells you everything happening inside your GCP project: servers, Cloud Function, Cloud Run, even deletions from Cloud Storages…)

Right in the middle you have a kind of terminal (query) where we can type cool commands.

Filter severity

We can write:

severity>=WARNING

And click “Run query” on the right, or CMD+Enter.

We see every log higher or equal to Warning. Of course, we can see all logs equal to Warning, or equal to Error, it’s very free.

Get Cloud Functions logs

We are here for Cloud Functions logs, type:

resource.type="cloud_function"

We see here all our Cloud Functions running. To see logs from a specific Functions, type:

resource.type="cloud_function"
resource.labels.function_name="test_logging"
resource.labels.region="europe-west2"

We now see our Function’s logs.

Bon… there is a faster way to get logs for a specific Cloud Functions. Go to the specific Cloud Function using Cloud Functions dashboard, click the log tab. On the right, you have an arrow into a square pointing to the right, it leads us to Stackdriver with an autocompleted query ;)

Now it’s time to have fun!

Get a specific execution

Imagine we have millions of logs and a Function crashed, we want to know what happened.

First, get the crash log:

resource.type="cloud_function"
resource.labels.function_name="test_logging"
resource.labels.region="europe-west2"
severity>=ERROR

You now see every time the given function crashed.

Have a look at one line. Every line have columns Severity, then Timestamp, then Summary. In the summary, we have the function name and the execution id. We need this execution id!

Click it, click “show matching entries” and comment the severity line.

resource.type="cloud_function"
resource.labels.function_name="test_logging"
resource.labels.region="europe-west2"
--severity>=ERROR
labels.execution_id="jzjvhz5cid4i"

We can now read all steps before the crash and debug this way. This execution id is particularly useful to understand what created a crash!

Cloud Run doesn’t have that, it is one of the drawbacks of Cloud Run, follow me to get the next article comparing Cloud Run and Cloud Functions ;)

Get a specific log

It’s possible to find a log in particular. Let’s say you are printing the input phone number of your user (yeah… poor example sorry):

print("INPUT PHONE NUMBER ", phone_number)

You can easily get all phone numbers from all executions:

resource.type="cloud_function"
resource.labels.function_name="test_logging"
resource.labels.region="europe-west2"
textPayload =~ "INPUT PHONE NUMBER"

You won’t get solid stats from that, but you can use it to check inputs, values, API call responses…

Get a specific log using regex

Yes, you can use regex!

This way:

resource.type = "cloud_function"
resource.labels.function_name = "test_logging"
resource.labels.region = "europe-west2"
textPayload=~"[1-9]\\d{8,11}"

This will return every phone numbers appearing in executions logs.

Stackdriver is a superb tool to get logs of all GCP services, learn how to use it to save a lot of time reading the logs!

Now we are used to severity, logs filtering, it’s time to get metrics!

Metrics and notifications

Ah… we now know about how to write logs and how to read them. Cool!

But we still need to read them… Every. Day.

This is our only way out is to get metrics from what happened last night.

We want to know, straight from the horse’s mouth, how many X we had. X being invocations, crashes, warnings, logs, API calls… All that things are called metrics and there is a place where we can create them…

Monitoring

Let’s take a basic business case: we want to know if an error occurs. Either from our logging.error() or a crash/timeout…

Go to Monitoring > Alerting.

Click Create a Policy.

Before creating a Policy, we need to Select a Metric.

If it’s the first time you are going here you must be thinking “oh waw we can do that 😲”, we can count everything!

Select Cloud Functions as a resource.

Here we can create various metrics: Execution time, Number of executions, Memory usage.

But what interests us the most is the “Logs-based metric” part.

Select “Log Entries”.

Here you can add a filter:

To get a specific Cloud Functions
To get a group of Cloud Functions (for a specific project)
To get every Cloud Functions except some of them

In my case, I am selecting the test_logging function.

And after finding your function(s), choose what we monitor.

In our case, we want to filter severity “one_of” ”ERROR” and “CRITICAL”.

On the right, we have a preview of this metric:

Select a Rolling window of 5 min and a Rolling Window function “count”.

Then, click next (and not Create Policy).

Select a Threshold > 0.

The metric we just created is telling us how many time there was an error during an execution. Which is already a huge time savior.

To get notified, just click next, use notification channels and you can configure your favorite notification channel (Slack, Mail, Webhooks, SMS, Pub/Sub).

Frankly speaking, this article is the last of the serie, and maybe the least interesting… I guess no one will read those lines so I won’t go into details for this part. Leave a comment if you read that and have an issue ;) (or clap 7 times, I will get the message 😀)

To learn more about monitoring, check the official documentation:

Monitoring Cloud Functions | Cloud Functions Documentation | Google Cloud

Send feedback Google Cloud's operations suite provides logging and monitoring tools that help you understand what is…

cloud.google.com

Bye

I hope this article brings your Cloud Functions development to the next level.

Previous parts: