Principles of Site Reliability Engineering at Google

  1. To hire great coders and let them leave if they want to leave.
  2. To hire your SREs and your developers from the same staffing pool and treat them all as developers.
  3. About 5 percent of the ops work goes to the dev team, plus all overflow.
  4. To cap the SRE-operational load at 50 percent (usually 30 percent)
  5. An on-call team has a minimum of 8 engineers for one location (or 6 engineers in each of two locations).
  6. Postmortems are blameless and focus on process and technology
  7. To have a written Service Level Objective (SLO) for each service and to measure performance against it.
  8. To use SLO budgets as your launch criteria.
  9. Practice and make it fun.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store