SlackOps for PagerDuty

RigD.io Collaborative Automation for Incident Management

What’s the status on that Incident?

SlackOps for PagerDuty Part 4

Justin Griffin
RigD.io
Published in
6 min readSep 19, 2019

--

SlackOps for PagerDuty Part 1 — How to Open PagerDuty Incidents from

SlackOps for PagerDuty Part 2 — Find PagerDuty On Call from Slack in Under 10 Seconds

SlackOps for PagerDuty Part 3 — Automate Incident Channels in Slack

No matter how effective you are at resolving incidents, poor incident update practices lead to headaches all around. If you don’t provide an update you can be sure someone will interrupt you to ask “what’s the status on this incident?” When you have customer impacting issues, not providing timely updates can significantly hurt customer satisfaction or lead to the outright loss of customers. A less obvious consequence is duplicated or wasted effort. If others don’t know what’s going on then they are likely to expend energy chasing down the same data or wrong leads. This is especially true when multiple incidents are going on and may be related or with complex incidents that involve multiple teams.

There is really no good reason to not to be making regular incident updates, yet this happens all the time. It’s easy to forget to make an update when you are working the incident. Even if you are fortunate to have an organizational structure that allows for a comms owner for every incident there are still often cases where updates are made internally but forgotten externally, or vis versa.

So lets see how to speed those update efforts up and help ensure one never gets missed.

Step 1 Make your updates quickly right from Slack.

With PagerDuty you can post a note to an incident or you can provide a status update. There are benefits to both and both can be done with RigD in Slack. Start by typing

add pagerduty note

Then provide the incident number and your note.

Add a PegerDuty Incident Note

Similarly for a PagerDuty incident status update start with

update pagerduty status

And again add the incident number and your status update.

Update PagerDuty Incident Status from Slack

One additional manual convenience we provide is our incident activities menu which appears with every PagerDuty incident feed notification or when you get incident details in Slack.

Take Any PagerDuty Incident Action from Slack

Step 2 Use RigD Automation to Open the PagerDuty Incident Slack Channel

Making those updates adhoc in Slack will definitely add a measure of convenience, but it won’t combat forgetfulness. To do that we need to set up automated update reminders. We will again use a RigD flow that makes both an incident status update and add a note, thus ensuring no one misses the latest update. We have another helpful guide to speed the setup of this. Start from the PagerDuty help by typing

help with pagerduty

Then choose the Automate Incident Updates button

Your first update should always be at a set amount of time; it’s the one update that most often gets missed or delayed while you try to validate the problem and asses the impact. Choose a time to make that first update, we recommend not more than 10 minutes.

How Long before you make your first update

Next you need to decide how to hand subsequent updates. Given most major incidents last for hours you will be making many follow on updates so you want to strike a balance in timing. You can also skip this input and choose the interval between updates manually after each update. This can helpful in managing that balance between over and under communication, but don’t forget to set it each time!

Decide how frequently to make subsequent Incident Updates

Finally, choose some text for the RigD alias trigger to make it easy to initiate the update automation during an incident.

Choose Text for your RigD Alias

You now have everything you need to never again forget to make an incident update. Let’s see how it works in practice by typing our alias text

!p1 updates

This update sequence will kick off in a Slack thread. Why do we use threads? Using a thread for this allows you to keep it in the forefront in Slack while you engage in discussion and coordination in your primary incident channel space. This helps reduce the potential to miss making an update and also prevents your update activity from distracting others in the main channel discussion.

Incident Updates in Slack Threads
Automated Incident Updates Reduce Problems

Automated reminders do reliably drive those incident updates and the speed and simplicity of making them right in Slack.

Now when it comes to making customer facing updates we love Atlassian StatusPage and so do a lot of our customers. So we often see this flow modified to include both an internal update to the PagerDuty incident as well as an external update to a corresponding StatusPage Incident. This easy to do with RigD and we are always available to help.

As with our previous parts lets take a look at the time savings and financial impact of this Slack based approach. Assuming a relatively simple and well understood update posting it manually in the PagerDuty UI takes about 26 seconds. The average duration for a major incident is 300 minutes, lets assume we make an update at 5 minutes, then every 30 minutes, and a final resolution update. That’s a total of 11 updates. If we are going to make both an update and post a note for completeness we are looking at a total update times of 9 minutes 32 seconds. Using an automated RigD update flow you are looking at at 3 seconds to start the flow and about 6 seconds per update for a total of just 69 seconds. Using our benchmark 7 major incidents a month and $5,600 cost per minute RigD reduces the per incident costs related to updates by an amazing $46,949. Incorporating this into our running total monthly major incident costs related to the activities discussed without RigD we have $428,585 vs just $56,186. You might be thinking that’s crazy do companies really loose that much money. Consider that Amazon lost an estimated $90m in about 75 minutes according to this Tech Crunch article. That’s $1.2 million per minute. Makes loosing 428 thousand dollars in a month seem minor. Sure none of us are Amazons size, but every dollar and minute lost matters regardless of company size.

Our final Part in the series will be out in no time so be on the lookout!

Learn more about RigD here, and give our Slack App a try.

--

--

Justin Griffin
RigD.io
Editor for

Father of two amazing boys and founder of https://rigd.io. DevOps, Golf, and Sailing enthusiast.