Downsampling and Exporting Stackdriver Monitoring Data

Stackdriver Monitoring contains a wealth of information about cloud resource usage, both for Google Cloud Platform (GCP) and and other sources. This post will explain how to use the Stackdriver Monitoring API to read, downsample, and export data from Stackdriver to BigQuery. Pub/Sub metrics will be used to demonstrate this.

You may want to export Timeseries data for a number of reasons, including

  • Ad-hoc queries of data, say for shifting and comparing of timeseries or to analyze data in a dimension other than time. This can be especially useful for identifying waste, like virtual machines with low CPU, wasted disk space, or data that is never accessed. Or you might have made a change to your application and want to compare efficiency and performance.
  • To keep metrics data for longer than the standard Stackdriver retention period. The post will also explain how to downsample the Timeseries data to reduce the volume of older data from the default 1 minute intervals that Stackdriver uses to 1 hour intervals. Downsampling reduces data storage costs. Data over the extended period data can be used for long range forecasting or or analyzing long term trends. You may want to overlay external data, such as economic conditions or seasonal events.
  • To save data from a special event, like a performance test or a marketing launch

Colab is a variant of the iPython provided by Google. Colab is an ideal tool to implement our goal because it allows scripting with the Python client API to experiment and explore data. Colab also has many for Google APIs built in and supports executing some shell commands. When you get the exporting of your data working in Colab you might want to move it to a regular job driven by cron or Google Cloud Scheduler.

To get started, import the monitoring_v3 APIs into Colab with the command

!pip install --upgrade google-cloud-monitoring

The command uses the Colab ! directive to execute a shell command.

Authenticate to GCP with the Python statements

from google.colab import auth
auth.authenticate_user()

This will open a new browser window for authentication.

Import the monitoring API and create a client object with the following statements

from google.cloud import monitoring_v3
client = monitoring_v3.MetricServiceClient()

Let’s set up some variables to hold input values:

import datetime
target_project = '[Your project]'
start = datetime.datetime(2019,3,5, 0, 0, 0, 0)
end = datetime.datetime(2019,3,6, 0, 0, 0, 0)
topic_bytes = 'pubsub.googleapis.com/topic/byte_cost'

We are going to collect data from the target project for the one day period beginning on 2019–03–05 and ending on 2019–03–06. Suppose we have many projects, then we can change the value of target_project and export the data to a home project. That will enable collection of metrics data from many projects into a central location.

One of the Pub/Sub metrics that we will export is topic/byte_cost. There are many more GCP Metrics that you could choose from, as well as other clouds and open source metrics.

We can find out about the metrics with the function below

def list_metric_descriptors(client, project_resource, metric):
resource_descriptors = client.list_metric_descriptors(
project_resource,
'metric.type="{}"'.format(metric))
for descriptor in resource_descriptors:
print(descriptor)

This will give details on the metric kind (DELTA in this case), value kind (int64), description, and other details. If we look at the metric in a Stackdriver dashboard we will see something like the chart below for a low traffic application:

Topic Bytes Delta metric Type Chart

From this chart we can see that a volume of approximately 4.88 KB are sent over the Pub/Sub service every 5 minutes.

We can downsample the timeseries with the function below

def to_csv_delta_metric(
client,
project_resource,
filter,
start,
end,
frequency,
colname):
interval = monitoring_v3.types.TimeInterval()
interval.start_time.seconds = int(start.timestamp())
interval.end_time.seconds = int(end.timestamp())
aggregation = monitoring_v3.types.Aggregation()
aggregation.alignment_period.seconds = frequency
aggregation.per_series_aligner = (
monitoring_v3.enums.Aggregation.Aligner.ALIGN_DELTA)
aggregation.cross_series_reducer = (
monitoring_v3.enums.Aggregation.Reducer.REDUCE_SUM)
results = client.list_time_series(
project_resource,
filter,
interval,
monitoring_v3.enums.ListTimeSeriesRequest.TimeSeriesView.FULL,
aggregation)
csv = '{0},{1}\n'.format('time', colname)
t = int(start.timestamp())
ts_array = []
for ts in results:
ts_array.append(ts)
if len(ts_array) > 0:
ts = ts_array[0]
for p in ts.points:
t += frequency
v = p.value.int64_value
csv += '{0},{1}\n'.format(t, v)
return csv
else:
print('Did not get any results back')
return ''

A filter is applied to select the relevant metrics, so more than one timeseries could be retrieved. This downsampling function is appropriate for ‘delta’ type metrics. See Stackdriver Kinds of Metrics for details on delta and other metric kinds. The series are aligned with a delta reducer. Delta type metrics are typically summed, which is what we do with the Aggregation object above. The results are put in a string buffer in comma separated form that we will load into Google Cloud Storage (GCS) below.

The buffer can be uploaded to GCS with the function

def upload_gcs(bucket, buf, filename):
fname = '/tmp/{}'.format(filename)
with open(fname, 'w') as f:
f.write(buf)
print('head {}:'.format(fname))
!gsutil cp {fname} gs://{bucket}/

These functions can be called with the code below

metric_names = ['topic/byte_cost',
'subscription/byte_cost',
'topic/send_request_count',
'topic/message_sizes']
colnames = ['topic_bytes',
'sub_bytes',
'send_request_count',
'message_sizes']
frequency = 3600 # 1 hour
for i in range(len(metric_names)):
filter = 'metric.type="pubsub.googleapis.com/{0}"'.format(metric_names[i])
filename = '{0}.csv'.format(colnames[i])
csv_buffer = to_csv_delta_metric(client,
project_resource,
filter,
start,
end,
frequency,
colnames[i])
upload_gcs(bucket, csv_buffer, filename)

This will export timeseries for the four Pub/Sub metrics given downsampled to one hour intervals. To find out more about extracting metrics using the monitoring_v3 API see Reading Metric Data.

The data can be loaded into BigQuery with the statements below

for i in range(len(colnames)):
filename = '{0}.csv'.format(colnames[i])
tablename = colnames[i]
!bq --project_id={home_project_id} \
--location=US load \
--autodetect \
--source_format=CSV {dataset}.{tablename} \
gs://{bucket}/{filename}

Once the data is loaded into BigQuery we can query on it. We could have done queries on it above after reading the timeseries from Stackdriver above but, in general, we want to load the data into BigQuery on a regular basis and then come back some time later and do ad-hoc queries on it.

Let’s do a simple query to verify that the downsampling is correct. The data can be queried from BigQuery in Colab with the BigQuery client API.

from google.cloud import bigquery
bq_client = bigquery.Client(project=home_project_id)
df = bq_client.query('''
SELECT
time, topic_bytes
FROM `{0}.topic_bytes`'''.format(dataset)).to_dataframe()
print(df)

The data from the query is read into the Pandas Dataframe df.

The data can be viewed using a Python graphics utility, such as matplotlib. Pandas can also be used to create a simple chart with the statement below

df.plot.bar(y='topic_bytes',
title='Topic Byte Cost',
color=['darkgrey'])

This results in the chart

Downsampled data (x axis is time, y is bytes)

Let’s check our math. Notice that we downsampled for frequency of one hour over a total interval of one day. There are 24 bars in the chart above, so the time adds up to the original period. Eyeballing the average hourly value is slightly below 60,000 bytes. From the Stackdriver chart above, we had 4.88 kB in a 5 minute interval. 12 * 4.88 = 58.56 kB per hour, which is close to the expected value (that is a relief).