The state of incident management: survey results

Annika Garbers
SignifAI
Published in
3 min readJul 23, 2018

Last month, SignifAI conducted a survey of DevOps and SRE teams to learn about the incident management tools and processes they’re currently using, common delights and gripes with each tool, and the ways teams triage incidents and measure their effectiveness. The 40 participants in the survey spanned a variety of industries and job titles, lending a well-rounded view of the state of incident management tools. Here are some insights from the results of the survey…

High-level overview

Likes and dislikes

Common issues

Triaging and measuring the incident management process

Survey demographics

Takeaways

Some major learnings from the results of this survey:

  • Incident management platforms are doing a great job on the integration process. With many tools offering one-click integration or a simple copy/paste of an API key or webhook, integrating your tools has never been easier.
  • The platforms may be too pricy for the features they offer. Many participants wanted to see “smarter” capabilities for their dollars, like automatic correlations among incoming events.
  • Measuring the performance of the incident management team/process is tricky and often overlooked.

How SignifAI can help

Duplicate incidents, lots of distracting alert noise, missing context, and the lack of automatic correlations across your production stack — these are all problems that SignifAI’s AI and machine learning-based platform eats for breakfast! Busy ops teams need a smart solution that does more than incident management. Learn more about SignifAI and the Decision engine that powers automatic correlations here.

Originally published at blog.signifai.io on July 23, 2018.

--

--