Finding your path with Site Reliability Engineering (SRE)

Mark Rendell
Nationwide Technology
2 min readAug 17, 2021

In this blog I am excited to share a simple decision tree tool that I’ve developed with teams who are getting started with Site Reliability Engineering (SRE). I think it’s worth first understanding a bit of context, but if you prefer, feel free to dive right in here.

Compass | Free Stock Photo | LibreShot

SRE is a set of principles and practices that can help organisations with the perpetual challenge of balancing changes to IT systems with the reliability, resilience, and operability of the corresponding production services. The concepts build upon things that we have believed and practiced for a long time, but over the last two years many of our teams have taken fresh inspiration from adapting and applying ideas from SRE.

The SRE body of knowledge is broad and can support teams in considering topics as diverse as culture, major incident management, team topologies, observability, and architecture. SRE can be useful as a team construct, a role, or simply as inspiration to develop processes that any team can adopt. Some teams find the diversity of content to be accessible and just what they need. A lot of teams however just want to know:

How do we actually get started with SRE, what should we do today?(many people)

In the past I tried distilling the bits I’ve found most differentiating about SRE into this self test script. I advocated the use of SRE to improve measurement and feedback loops for two important priorities:

  • Quality of service — within the bounds of the functional capability, is the service that the user/customers good enough?
  • Service Operability — are we happy with the impact that supporting this service has on our organisation and our colleagues?

The decision tree matches this format and provides step by step suggestions on what to do next within your context. I also briefly tackle the financial motivations.

I hope you are now intrigued enough to click here and discover whether the tool helps you with your SRE journey. Please let me know if you find it useful or even better send a pull request on Github!

--

--