Stackdriver Monitoring Automation Part 3: Uptime Checks

Published in

Google Cloud - Community

4 min readSep 28, 2018

This post is part 3 in the Stackdriver Automation series. In part 1 and part 2, I covered automating the management of Stackdriver Groups and Alerting Policies, respectively. In this post, I will walk through the steps that you can use to automate the management of Uptime Checks. Head over to part 1 for the background and prerequisites.

Uptime Checks

Uptime Checks lets you verify the availability of your service via HTTP, HTTPS or TCP health checks. Stackdriver provides this service by accessing your application’s frontend from locations around the world and reporting on the results such as latency and response codes. You can use this service as a way to understand whether your users can access the service from various global locations.

As an example, I created an Uptime Check based on the apache infrastructure that I described in parts 1 and 2. The application itself is a simple website being hosted by apache servers behind a load-balancer. I created a simple HTTP Uptime Check to understand whether my users could access the service.

The projects.uptimeCheckConfigs.create API lists the following values that are required to create an Uptime Check.

{
  "name": string,
  "displayName": string,
  "period": string,
  "timeout": string,
  "contentMatchers": [
    {
      object(ContentMatcher)
    }
  ],
  "selectedRegions": [
    enum(UptimeCheckRegion)
  ],
  "isInternal": boolean,
  "internalCheckers": [
    {
      object(InternalChecker)
    }
  ],// Union field resource can be only one of the following:
  "monitoredResource": {
    object(MonitoredResource)
  },
  "resourceGroup": {
    object(ResourceGroup)
  }
  // End of list of possible types for union field resource.// Union field check_request_type can be only one of the following:
  "httpCheck": {
    object(HttpCheck)
  },
  "tcpCheck": {
    object(TcpCheck)
  }
  // End of list of possible types for union field check_request_type.
}

I used the “Try this API” sidebar in the projects.uptimeCheckConfigs.create docs to test out the values. I separated the templates into jinja templates and yaml files so that I could reuse the jinja templates for any other Uptime Checks.

stackdriver_uptimecheckconfigs.jinja

{% set PREFIX = env["deployment"] %}
{% set UPTIME_CHECKS = properties["uptimechecks"] %}
{% set PROJECT = env["project"] %}resources:
{% for uptimecheck in UPTIME_CHECKS %}
- name: {{ PREFIX }}-uptimecheck-{{ loop.index }}
  type: gcp-types/monitoring-v3:monitoring.projects.uptimeCheckConfigs.create
  properties:
    parent: projects/{{ PROJECT }}
    displayName: {{ uptimecheck.name }}
    period: {{ uptimecheck.period }}
    timeout: {{ uptimecheck.timeout }}
    monitoredResource: {{ uptimecheck.monitoredResource }}
    httpCheck: {{ uptimecheck.httpCheck }}
{% endfor %}

Notice that I created the yaml with the ability to create multiple Uptime Checks under the uptimechecks block. I created a standard httpCheck against the URL for my load-balancer which is represented as a monitoredResource object in the API.

stackdriver_uptimecheckconfigs.yaml

Please the YOUR_PROJECT_ID value with the value of your GCP project id along with your URL value of the host

imports:
- path: stackdriver_uptimecheckconfigs.jinja
resources:
- name: create_uptimechecks
  type: stackdriver_uptimecheckconfigs.jinja
  properties:
    uptimechecks:
    - name: "1 - Website uptime check [global]"
      period: "60s"
      timeout: "10s"
      monitoredResource:
        type: "uptime_url"
        labels:
          project_id: YOUR_PROJECT_ID
          host: "35.241.47.194"
      httpCheck:
        path: "/"
        port: 80

You can find the jinja and yaml files on the github repo.

The last step was to use the gcloud command line below to actually create the Stackdriver Uptime Check.

$ gcloud deployment-manager deployments create website-uptimecheck --config stackdriver_uptimecheckconfigs.yaml                                                                                           
Create operation operation-1537814541833-576a2597f1329-1faf3a15-f226e553 completed successfully.
NAME                         TYPE                                                                   STATE      ERRORS  INTENT
website-uptimecheck-uptimecheck-1  gcp-types/monitoring-v3:monitoring.projects.uptimeCheckConfigs.create  COMPLETED  []

Once the uptime checks were created, I used the Stackdriver Monitoring console to verify that the uptime checks had been successfully created. Keep in mind that when you first create uptime checks, it can take up to 25 mins to start reporting status according to the docs.

Now, the uptime checks are active and can be used in conjunction with alerting policies to notify you when the uptime checks fail or when the latency is outside of tolerance.

Conclusion: The Stackdriver Monitoring Automation Series

This concludes the Stackdriver Monitoring Automation series. In this series, I have included the steps that I took and the methods that I used to create Stackdriver Monitoring components via Google Cloud Deployment Manager. In the first post, I created a Stackdriver Group used to group resources to monitor as a single entity. In the second post, I created Stackdriver Alerting Policies to define when to send an alert and what alert to send. In this third post, I created Stackdriver Uptime Checks to provide a basic picture of end-user experience. You can use these steps and config files as templates to automate the deployment of Stackdriver Monitoring resources in your environment.

Read more about Stackdriver Monitoring Automation in the other posts in the series and references below.

References:

Stackdriver Monitoring Automation Part 3: Uptime Checks

Uptime Checks

Conclusion: The Stackdriver Monitoring Automation Series

Written by Charles