Service Level Objectives in Practice

Objectives in Practice

Defining Objectives

  • 99% (averaged over 1 minute) of Get RPC calls will complete in less than 100 ms (measured across all the backend servers).
  • 99% of Get RPC calls will complete in less than 100 ms.
  • 90% of Get RPC calls will complete in less than 1 ms.
  • 99% of Get RPC calls will complete in less than 10 ms.
  • 99.9% of Get RPC calls will complete in less than 100 ms.
  • The server itself: because that server might have measurement glitches due to pauses in its own processing: such as delay accepting the TCP connection, or delay writing data to the network.
  • The client on the internet: because your latency numbers will vary wildly depending on their Internet connection. You don’t want to wake someone up in the middle of the night because your app is suddenly popular in Australia!
  • 95% of throughput clients’ Set RPC calls will complete in < 1 s.
  • 99% of latency clients’ Set RPC calls with payloads < 1 kB will complete in < 10 ms.
  • 99.99% of Set RPC calls sent by our prober will complete in < 10 ms.
  • 99% of of Set RPC calls sent by customers will complete in < 10 ms.

Choosing Targets

Don’t pick a target based on current performance

Keep it simple

Avoid absolutes

Have as few SLOs as possible

  • Static asset serving
  • Simple Get operations.
  • Complex Get operations (Searches?).
  • Set Operations.

Perfection can wait

Control Measures

  1. Monitor and measure the system’s SLIs.
  2. Compare the SLIs to the SLOs, and decide whether or not action is needed.
  3. If action is needed, figure out what needs to happen in order to meet the target.
  4. Take that action.

SLOs Set Expectations

Keep a safety margin

Don’t overachieve

--

--

--

Stephen is a Staff Site Reliability Engineer at Google, where he works on the Google Cloud Platform.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Developing on GCP 💻 — Managing Source Codes with Cloud Source Repositories 📁

Set up User Pool in Amazon Cognito

Oh!Hash Marketplace Roadmap 2021~2022

Simple Restforce Tutorial

[TDD] Simple REST API with Test Driven Development — using Java, Spring Boot and JUnit

In less than 5 minutes, Java On Azure

Android Studio Setup For Cordova Mac

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Stephen Thorne

Stephen Thorne

Stephen is a Staff Site Reliability Engineer at Google, where he works on the Google Cloud Platform.

More from Medium

Dev platform architecture — Part 4 — Control Center sample implementation

How to implement logging in your REST service by using Elasticsearch

drawing

An implementation of TLS Handshake Part 1: Overview

Message Queues