Monitoring and alerting on bytes sent from Google Cloud Storage buckets

Dom Zippilli
Google Cloud - Community
6 min readDec 20, 2019

Disclaimer

I am a Googler, and I work in Google Cloud specifically. All opinions stated here are my own, not that of Google, LLC.

Introduction

I often discuss with customers how to keep an eye on their egress from Google Cloud Storage buckets. There are a few reasons you might care about this metric:

  • Your bucket is publicly available, and you want to monitor the rate of data transfer out of it.
  • Your bucket is origin for a CDN where you’d have costs for cache fills, either from the CDN or simply internet egress charges.
  • Your bucket’s default and/or predominant storage class is Nearline or Coldline, and you want to monitor reads to keep retrieval costs in check.

Fortunately, this is very easy to monitor using Stackdriver, built into GCP. It is expressed as the metric storage.googleapis.com/network/sent_bytes_count. In this article, I’ll show you how to set up monitoring and alerting for this metric in the Google Cloud web UI.

How to set up the Stackdriver metric

The first thing you’ll want is to capture and visualize the metric. First, go to Stackdriver Monitoring. You can find this in the hamburger menu on the GCP console. The icon looks like this:

What the Stackdriver Monitoring button looks like.

From within the Stackdriver Monitoring page, click on the Metrics explorer.

Now you should have an option to “Find resource type and metric.” Type storage.googleapis.com/network in there, and you should see metrics for bytes both sent and received.

It can be confusing at first to know which one you want. The easy thing to remember is the metric is from the bucket’s perspective, so sent bytes are object reads, and received bytes are object writes.

Click on the sent_bytes_count metric to select it. Then you’ll be able to set filters and other settings.

For filter, think about what is a useful scope to look at. Are you concerned with just a single bucket? All buckets in a location? All in a project? Choose whatever suits your use case. Keep in mind that after filtering, you can perform grouping, so if you want to see all buckets in a project individually, you can filter by project_id, and then group by bucket_name.

Keep in mind, also, that you can put wildcards in filters. For example, to filter to a bucket name prefix, you can add a filter like =~"prefix-.*".

Finally, make sure you use the sum aggregator. The sent_bytes_count is a “delta” metric, meaning it is the count of new bytes sent since the last sample. For example, if you send 1MB in minute 0, and then 1MB in minute 1, both samples will read 1MB (as opposed to minute 1 reading 2MB, for bytes sent up to that point). Consequently, should you aggregate these into, say, the first five minutes, you would want to sum the samples to get an accurate idea of bytes sent in those five minutes.

Now, you should have a nice graph of bytes sent from your bucket(s), like this:

A typical GCS bucket egress chart.

Go ahead and click Save Chart, and put the chart on a dashboard for easy viewing.

Note: It is really common to discuss networking throughput in bits per second. This graph is neither of bits, nor by the second. This is bytes per minute. Make sure you do the appropriate conversion between this and any network throughput quantities relevant to your work (e.g.; 1GB/min is 133Mib/s).

How to set up an alert on the metric

Unless you really like watching graphs, the next thing you’ll want to do is set up an alert on the metric so you can know when it’s exceeded specified threshold(s).

It’s worth mentioning that in my experience, this metric is accurate but slightly delayed. The sample size is 60 seconds, so even if the samples were delivered instantly, you’d always be looking about 30 seconds in the past. In practice, I’ve found my data lags the current time by a few minutes. So I would expect an alert to actually reach me about 5 minutes, give or take, after the threshold was reached. I think this is plenty fast enough, but it’s worth understanding.

To begin, go to the Alerting section of the Stackdriver UI. Look for the CREATE POLICY button and click it.

Give your alerting policy a relevant name, like “Bandwidth alert.”

Next, you need to add a condition. Adding a condition is a lot like creating a graph. Select the storage.googleapis.com/network/sent_bytes_count metric, and adjust the filters and grouping as necessary for your case. Then, you’re going to add a Configuration to specify the alerting condition.

An example alert configuration, where more than 1GiB is read from a bucket in one minute.

The alert configuration shown above would work well for alerting on egress in the time series you get from your Target settings. Note that the way you set up the Group By values can have a big impact on how this alert works. For example:

  • If you filtered on project…
  • And grouped by bucket name…
  • This alert would only be triggered if a single bucket exceeded 1GiB per minute.

On the other hand…

  • If you filtered on multiple projects, like project_id ~= project1|project2
  • This alert would be triggered if, in aggregate, buckets across those projects exceeded 1GiB per minute.

Next, you’ll want to add a notification channel. There are lots of ways to configure these, well explained in the documentation.

Speaking of documentation, the last option you’ll have is provide some documentation for the alert in notifications. It’s a good idea for this to explain the alert’s significance in terms an on-call or teammate can easily understand.

Here’s how my alert is configured:

An alert configuration for a 1GB/min egress from any of my project’s GCS buckets.

I strongly recommend setting up several conditions with graduated alert thresholds that approach whatever limit concerns you. In other words, if you think 1TB/min is a threshold where human attention is needed, set a condition for 750GB/min, and another for 875GB/min. That way, a human may be able to take a look before the true target is exceeded, despite the delays introduced by metric latency, notification delay, reaction time, etc.

I hope this article helps you, and have a great day in the cloud!

--

--

Dom Zippilli
Google Cloud - Community

Googler, often writing about Google Cloud Platform. All opinions stated here are my own, not those of Google, LLC.