Incident Management made easier with Shared Terminal support in Epiphani Playbooks

Shilpa Moghe
epiphani
Published in
4 min readOct 14, 2020

Imagine you get paged about an incident, service is disrupted and the whole organization is waiting for you to fix the issue, it is a pressure situation. You remember a similar issue was fixed a few weeks back. But you do not know what was done to restore the service. Or say you want to make a report about how many incidents occurred over last month, how fast they were resolved and how many people were involved. Does this sound familiar? Most of the engineering organizations have to deal with these situations on a daily basis. Most organizations have a roughly defined process about incident management but almost none have a single go-to point from where everything from creating an incident, gathering info, resolving the issue, communicating with teams and creating incident reports can be managed. To fix this exact issue and help on-call engineers, Epiphani has augmented its Playbook Engine with shared terminal support.

When an on-call engineer gets informed of the incident, the most important job for the engineer is to restore the service so that down time for the customers is minimized. In order to mitigate the issue the engineer has to do a lot of things like checking logs, stats, login to different systems to gather all this information, run some scripts to determine the cause of the incident. Any mistakes done at this time can prove to be expensive.

Epiphani playbook engine with shared terminal support can help on-call engineers to troubleshoot the issues effectively and reliably. The engineer can start by creating an incident in the Epiphani app and immediately start collaborating within and across teams. All the playbooks and scripts within Epiphani app will be immediately available for the engineer to use. It can look at the record of earlier incidents and look at the snapshots to quickly figure out how the problem can be fixed. Sharing and collaboration becomes much easier since all the actions taken by the person are recorded within the Epiphani playbook app. It can also help engineer look through previous incidents to help resolve the incident faster.

You can host your own instance of Epiphani Playbooks in your private AWS cloud as specified in this blog. Just look for AMI ID “ami-0eb435a881f309f08” on Amazon or search for “epiphani” and use the AMI “epiphani_playbooks_with_shared_terminal_v1.0.14-b267642d-8383–41e8–91be-f927582fc3ca-ami-025024d22d2d33d02.4”

Epiphani Incident Management Workflow

Epiphani Investigation/Incident Workflow

The diagram shows how Epiphani app provides a single point of access for doing everything that is involved while responding to an event. The prep work involved is to create device inventory, configure SSH keys and create playbooks or use existing scripts as playbooks from within. Once that is done an engineer can just login to the Epiphani app, create an investigation or join an existing investigation and start participating. Now he/she does not have to open multiple windows, type in passwords or provide keys every time. It can all be done from Epiphani Investigation board. This process is explained below in the article.

Configuration

You configure your device inventory like servers, routers you frequently access along with your SSH keys in the Epiphani app. This is one time configuration that can be done in Epiphani app as shown in the video below.

Similarly your frequently used scripts, playbooks can be saved within Epiphani app as playbooks or you can create new playbooks using connectors provided by Epiphani. This is also a one time activity. Once created, it can be reused multiple times by all users who have access to your Epiphani app. You just login to the app and execute any playbooks and scripts from within. The owner of the playbook or script do not need to be available to do this. You can check how you can create playbooks in Epiphani app here.

Investigation Board

Now let’s look at the steps you need to do when an incident is reported via say PagerDuty or any other incident reporting system. If you have Epiphani app installed in your AWS private cloud, you can login into it. Create an incident/investigation as shown below.

Incident Reporting-Create Investigation

The investigation dashboard becomes your go-to place and from there you can pretty much do everything that is needed. Here is how an investigation board looks like.

Epiphani investigation board

Most important is all your actions are recorded so you do not have worry about forgetting anything or noting down everything you are doing. You can collaborate with your team using the Scroll or Timeline. The video below explains in detail all the components of investigation dashboard.

To summarize, with shared terminal support in Epiphani playbooks, engineers can create incident management pipeline as mentioned in this blog. You can record the actions taken by an engineer to restore the service while responding to the unplanned event. This recording can help mitigate future events. It can act as a collection of tribal knowledge that can be shared across the teams.

With Epiphani Shared Terminal you would also get Epiphani Playbooks Engine. To learn more about how to use the Epiphani Playbooks engine, please refer to our other articles:

We would love to hear from you. You can contact us at feedback@epiphani.ai.

--

--