Why SLOs Are Useful: Scaling an Organization from First Principles
This talk was held on Wednesday, August 21, 2019.
Outages happen. Products break. Every time a failure occurs, it’s an opportunity to learn and improve. Web-based products are incredibly complex. By understanding and managing their complexity, carefully investigating incidents, and improving responses, we can build more reliable products and more resilient systems.
Tristan Slominski, site reliability engineering manager, constantly strives to offer an answer to the question “Why are we doing this?” His talk is a distillation of models that seem to explain why certain known practices work. We understand that “two pizza” teams are about the right size. We know APIs are “good.” We adopt SLOs as “good.” But did you know that we can explain the effectiveness of these three standards through a single equation commonly referred to as the Universal Scalability Law? They’re all solutions to the problem of managing complexity at different organizational scales.
Tristan Slominski is passionate about design, development, and operation of self-directed teams and decentralized, distributed systems. His past roles include staff software engineer, chief technology officer, and head of product development. Tristan is a former army aviator who served combat tours in Afghanistan and Iraq.
Cross-posted on Indeed Engineering Blog.