Stackdriver Monitoring Automation Part 1: Stackdriver Groups

Charles
Google Cloud - Community
5 min readSep 28, 2018

--

Many developers and organizations recognize that automation is critical to the long-term management of cloud infrastructure and applications. Developers, Site Reliability Engineers and Operations teams just getting started with Stackdriver Monitoring may wonder what components can be used with automation. I recently went through the exercise of automating the creation of Stackdriver Monitoring components for every new project created within an organization.

In this series, I have included the steps that I took and the methods that I used to create the automation. You can use these steps to automate the deployment of Stackdriver Monitoring resources in your environment. This post covers Stackdriver Groups while part 2 and part 3 cover Alerting Policies and Uptime Checks, respectively.

What’s available for automation in Stackdriver Monitoring?

The following components are available via API and therefore can be used with the automation.

  • Stackdriver Groups
  • Alerting Policies
  • Uptime Checks

The Stackdriver Monitoring APIs are available via REST API and gRPC. The same REST APIs can be used in Google Cloud Deployment Manager or even the gcloud command line. Using Deployment Manager makes it easy to treat the configuration as code and further automate the management of the Stackdriver Monitoring components. I used the Deployment Manager to implement the automation.

1. Prerequisites

To get started, I setup a Stackdriver Workspace and at least 1 GCP project associated with the Workspace. Workspaces were formerly called Stackdriver Accounts until 9/12/2018. See this blog post for more details.

I used a GCP environment that had many different components already in use including Cloud Storage, BigQuery, Compute Engine, Kubernetes Engine, Cloud Functions and Cloud ML Engine. Using an environment with many components in use made it easier for me to the monitoring components.

2. Stackdriver Groups

Stackdriver Groups lets you define and monitor logical groups of resources in Stackdriver Monitoring. These include resources such as VM instances and containers. Groups can then be used to monitor a set of resources as a single entity. Groups can also include subgroups which can be used to logically build a groups hierarchy.

As an example, I create a group to monitor all of the resources related to a set of apache instances running on GCE. When I deployed the GCE resources, I added the tags app:website, env:prod to each of the GCE instances. You can see the labels attached to the instance below.

$ gcloud compute instances describe wwwhttp11 --zone us-west1-bcanIpForward: false
cpuPlatform: Intel Broadwell
creationTimestamp: '2018-09-24T13:16:33.497-07:00'
deletionProtection: false
...
id: '1488468442473625840'
kind: compute#instance
labelFingerprint: FrEu9_zPopQ=
labels:
app: website
env: prod

Then, when I created the Stackdriver Group, I could filter on the tags to include those specific resources in the group. This also highlight the dynamic abilities of Groups. I could also have used a name or resource type to filter as well. I also broke the groups into subgroups based on the env tag which allowed me to segregate the apache instances by environment.

The projects.groups.create API lists the following values that are required to create the Group.

{
“name”: string,
“displayName”: string,
“parentName”: string,
“filter”: string,
“isCluster”: boolean
}

I used the “Try this API” sidebar in the projects.groups.create docs to test out the API and ensure that I had the right values. This is an easy way to sanity check configuration values before adding them to deployment manager or via code.

I used the following values to test creating a simple Group:

{
"displayName": “Apache test”,
"parentName": “”,
"filter":”metadata.user_labels.app=has_substring(\"website\")”,
"isCluster": false
}

The “Try this API” process works well for testing the API though I wanted to create a repeatable process that I could automate. This is where Deployment Manager comes in. I created the Deployment Manager configuration files to supply the values to the projects.groups.create API. I separated the templates into jinja templates and yaml files so that I could reuse the jinja templates for any other Groups.

stackdriver_groups.jinja

resources:
- name: {{ env["name"] }}
type: gcp-types/monitoring-v3:projects.groups
properties:
name: "projects/{{ env["project"] }}"
displayName: {{ properties["group_display_name"] }}
parentName: "{{ properties["group_parent_name"] }}"
filter: {{ properties["group_filter"] }}
isCluster: {{ properties["group_is_cluster"] }}

Notice that I created 3 separate groups in the yaml file: Apache, prod and qa based on the tags. All apache instances were included in the Apache Group based on the app=website tags. Only the instances tagged with env=qa and env=prod were included in the qa and prod Groups, respectively. The qa and prod Groups specify Apache as the parent which tells Stackdriver Monitoring that these are subgroups.

stackdriver_groups.yaml

imports: 
- path: stackdriver_groups.jinja
resources:
- name: create-apache-group
type: stackdriver_groups.jinja
properties:
group_display_name: "Apache"
group_parent_name: ""
group_filter: "metadata.user_labels.app=has_substring(\"website\")"
group_is_cluster: false
- name: create-apache-prod-group
type: stackdriver_groups.jinja
properties:
group_display_name: "prod"
group_parent_name: $(ref.create-apache-group.name)
group_filter: "metadata.user_labels.env=\"prod\""
group_is_cluster: false
metadata:
dependsOn:
create-apache-group
- name: create-apache-qa-group
type: stackdriver_groups.jinja
properties:
group_display_name: "qa"
group_parent_name: $(ref.create-apache-group.name)
group_filter: "metadata.user_labels.env=\"qa\""
group_is_cluster: false
metadata:
dependsOn:
create-apache-group

You can find the jinja and yaml files on the github repo.

The last step was to use the gcloud command line below to actually create the Stackdriver Group via Deployment Manager. I used the --preview command line argument first to make sure that I got the deployment configuration right.

$ gcloud deployment-manager deployments create apachegroup --config stackdriver_groups.yaml --previewThe fingerprint of the deployment is nm2-shmY2Oj2tDQfCc4__g==
Waiting for create [operation-1538010937030-576d0138ff573-f3a4b3a2-a3ed8f5c]...done.
Create operation operation-1538010937030-576d0138ff573-f3a4b3a2-a3ed8f5c completed successfully.
NAME TYPE STATE ERRORS INTENT
create-apache-group gcp-types/monitoring-v3:projects.groups IN_PREVIEW [] CREATE_OR_ACQUIRE
create-apache-prod-group gcp-types/monitoring-v3:projects.groups IN_PREVIEW [] CREATE_OR_ACQUIRE
create-apache-qa-group gcp-types/monitoring-v3:projects.groups IN_PREVIEW [] CREATE_OR_ACQUIRE

The config did not generate any errors and so, I submitted the command to create the deployment.

$ gcloud deployment-manager deployments create apachegroup --config stackdriver_groups.yamlCreate operation operation-1537807543251-576a0b8593038-2a7c079e-38f54092 completed successfully.
NAME TYPE STATE ERRORS INTENT
create-apache-group gcp-types/monitoring-v3:projects.groups COMPLETED []
create-apache-prod-group gcp-types/monitoring-v3:projects.groups COMPLETED []
create-apache-qa-group gcp-types/monitoring-v3:projects.groups COMPLETED []

Once the Groups were created, I used the Stackdriver Monitoring console to verify that the Apache, qa and prod Groups had been successfully created. Notice that the prod and qa subgroups appear under the Apache group.

This concludes part 1 of the series. Read more about Stackdriver Monitoring Automation in the other posts in the series and references below.

References:

--

--