The Continuously Evolving Nature of SRE

  • Production Engineering as a natural evolution of Site Reliability Engineering (SRE version 2.0)
  • Six core beliefs I have about Production Engineering aka The PE Principles
  • Production Engineering constantly evolves and will be exactly what it needs to be for the phase of growth that a company is in
  • Centralized Team: Generally where you start, a single team that provides infrastructure and operations support for the entire organization. Not much automation exists yet because you’re still figuring out the business model and pivoting constantly so you remain highly flexible.
  • Distributed Teams: Moving to a model where you have teams of PEs (7–10 members each) that partner with specific critical products or areas of the business to provide support. This allows you time to provide consulting services, solve common problems across the board, but also engage in process-automation-driven-development where you automate many of the operational processes of the organization.
  • Embedded: This is the model most are familiar with, where PEs (SREs) are directly embedded into a product team, usually a couple that reports to that team’s manager and are usually the ones tasked with doing any reliability and operational work. This model is tough because smaller to medium-sized orgs (anyone that is not Google) usually do not have enough people to distribute to actually have an outsized impact.
  • Consultative: Think Reliability Consultants, this is the model you use when you want to come in for a few months, solve some very obvious problems, help a partner team standardize on platforms and infrastructure and follow best-practices, then get the hell out. Generally, someone will scream that they need help, and you drop in these consultants to help.
  • Special Projects: Sometimes PE Teams identify a common set of problems across an organization, but there’s no one on the platform, infrastructure or common services+library side actually working on fixing it. So you assemble a tiger team, solve the problem yourself, then ultimately hand off the work generated to another team to maintain long-term and get back to your day jobs. PE Teams should be happy to create things and then give them away.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store