Rebuilding SRE, from Memory
At the new gig, there is a desire for SRE. We have the book, and the new book. And now a third, even. But what are we missing?
Processes, forms, checklists.
Norms, tools, ways of thinking.
And so I find that I’m writing a lot of documents. At last count, I’m at about 150, but honestly a lot of those are meeting notes. Maybe even some postmortems :)
I think, though, it would be helpful to adapt a few of these for public consumption. So, here is a list of some things I’ve written, am writing, or intend to write Real Soon Now.
Notably, these aren’t really technical. They’re people-stuff.
But they’re SRE-people stuff. I suppose not a big surprise, if you know me :)
In no particular order:
- Design Doc Template
- Postmortem Template
- Interviewing Template, Grading Rubric
- Meeting Notes Template
- SRE Org Structure Guidelines, Options
- A Common Understanding of Production — “Problem explanation, discovery, recurrence, validation, prioritization, prevention”
- Bug Response SLO
- PostMortem Review Process
- Release Guidelines
- Blackbox Monitoring (Theory, Implementation)
- Risk Analysis Template (h/t @xleem)
- Production Reliability Principles (hey, already done!)
- A Service Maturity Matrix
- SLOs: Theory, Practice
- Oncall: comp, norms, principles
- Affecting Production While Avoiding Doom: or “using math for risk”
- Review Boards (Product, Engineering, Production)
- Launch Checklist (“Am I ready for traffic?”)
- Monitoring Theory, Practice
- Operational Norms
- Cloud Observation Requirements
- Debugging Distributed Systems
- Understanding CAP, ACID, BASE
- Asynchronous Jobs
- Technical Debt and how to Make Progress
- Escalating to Management: why it is a Good Idea
- How to OKR and Why
- Intro to Kubernetes (k8s)
- Intro to Istio
- Basic Cloud Topology and its Consequences
- Intro to Cloud Load Balancing
- Service Ownership: Beyond the NOC
- Basic Capacity Planning for Cloudy Services
- Documentation Requirements: Collaborating about Production
I hope, in time, to be able to publish (and improve) these so they might be helpful to the broader community.
If anything looks particularly appetizing, please let me know and I can have a place to start.