…our deployment pipelines, and are graphs didn’t highlight the failure until a customer reported it. A COE is designed to present a cogent summary of what happened, how it went down (the timeline), why it happened, and most importantly, what’s next. COEs are written and presented liberally across many different scenarios. They are not a mark of sha…
rga…stood for “Cessation of Employment.” He quickly clarified that it just meant “Correction of Error.” To this day, I believe that this is one of those processes at Amazon that just yields so much value. Processes often complicate things with tedium and COEs are sometimes no exception; however, well-written COEs aims to ensure that the same issue never happens again. COEs are also presented before senior leadership to share lessons learned across different organizations as well. That’s it. They exist as a learning mechanism for “failures.”
Related to this is using data for operations work. Every week, our team’s on-calls has a meeting reviewing all the graphs containing hundreds of thousands of data points from our service. We discuss every spike or dip to ensure there is no issue. Again, data drives our thinking. Merely …