Creating SLOs with Terraform

Yuri Grinshteyn
Aug 31, 2020 · 3 min read

Some time ago, I looked at using the Service Monitoring API to create basic SLOs against “out of the box” services like App Engine. This functionality has seen a lot of updates since then, and there’s now Terraform support for creating custom services and SLOs. I wanted to have a go at this myself to see how it works.

Creating the service

SLO Monitoring does a great job of identifying services for you if you’re using things like Istio, App Engine, or Cloud Endpoints. But what if your service is on GCE, for example? In this case, you need to define it as a custom service, which will then allow you to define SLOs against it.

Here’s how to define a “monitoring service” in Terraform:

The service definition is actually very simple — you just provide a service ID unique to your project and a display name. Once you run “terraform apply”, the service is then visible in the Console:

From there, you can use the UI to create an SLO against it:

Image for post
Image for post

Note that you have to use “Other” as the metric — custom services don’t have an “out of the box” understanding of availability and latency. So, you need to have a good SLI for your service. You could use something like a log-based metric, a metric emitted by the Google Cloud Load Balancer if you’re using that, or a custom metric being written by the service. Let’s take a look at defining an SLO using the latter.

Defining the SLO

Here’s how to define an SLO using Terraform:

There are 3 main things to consider here:

  • The basics — the resource ID, the SLO ID, the service you’re defining the SLO against, and the SLO display name.
  • The SLI — are you going to be using a request- or windows-based SLI? If request-based — how will you count total requests and differentiate between good and bad requests?

In my example, I’m using a service that’s been instrumented to emit two separate metrics — one to count all requests and another to count errors. This makes things quite simple.

  • The goal — what’s the actual target for your SLO?

In this example, the goal is that 99% of requests are successful over a rolling 1-day period.

Creating and validating

At this point, run to make sure everything is correct:

Image for post
Image for post
Output of `terraform plan`

If everything looks correct, run to create the service and the SLO(s):

Image for post
Image for post
Output of `terraform apply`

Note that my file has two SLOs — a request-based one for availability and a windows-based one for latency. That’s why 3 resources are being created.

At this point, you can go back to the console and check your new service:

Image for post
Image for post

I’ve clearly not set my availability target correctly (or my service is having some serious issues) — I should absolutely revisit this before I take the next step to set up an error budget burn alert on this.

Summary

I’m really excited to see service and SLO support come to Terraform, and I hope lots of folks will take advantage of this to extend their automation capabilities. At this point, all of the major monitoring primitives can be created automatically once a project is up — this is great news! Thanks for reading, and let me know what you think!

Google Cloud - Community

Google Cloud community articles and blogs

Yuri Grinshteyn

Written by

CRE at Google Cloud. I write about observability in Google Cloud, especially as it relates to SRE practices.

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Yuri Grinshteyn

Written by

CRE at Google Cloud. I write about observability in Google Cloud, especially as it relates to SRE practices.

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store