In this post I will show you how you can use Slack to handle AWS CloudWatch alarms. Why should you care to see alarms in Slack? One can argue that chat rooms are not for incident handling, they are however the perfect place to discuss incidents. Seeing an actual alarm in a Slack channel makes it super easy to discuss it with your team, because all team members actually know what is being discussed and can chip in.
Another advantage is that getting alarms to Slack enables and facilitates fixing them. An alarm staying in a monitoring system can be neglected, when it comes to Slack, somebody will notice, and there is a possibility that somebody will fix it for good.
Opsidian has a neat feature to show CloudWatch alarms, what is even better, you can tell Opsidian to notify you when an alarm is created. Let’s set it up, and see how usable it is. To configure alarms, execute:
/ops alarms configure
So in order to configure alerting on any new alarms, we need to set up an SNS topic and connect it to Opsidian. Let’s create a new SNS topic called
opsidian-alarms and add the subscription link, which Opsidian showed in step 4. of the previous step.
Now, when you click
Create subscription you will get a notification that Opsidian is ready to handle alarms. What is important is that the alarms will come to the channel where you executed
/ops configure. In my example I just used #general, but for real life scenarios I would recommend having a channel per client or app, e.g. #myapp-alarms. This way people who are not interested in particular alarms can unsubscribe from a Slack channel or just mute it.
You can subscribe to as many topics as you like. To list your subscription for a channel use:
/ops alarms list topics
Ok let’s create a CloudWatch alarm and see if we can make Opsidian notify us when the alarm fires. To do this I will simulate high CPU Usage on a t2.micro Amazon Linux server instance. Let’s go to its
Monitoring tab and hit
Create Alarm. I set up my alarm to be fired when the CPU Utilization metric goes over 40%. In real life scenarios you would want to set it to a higher level.
Let’s verify that the alarm was in successfully created with
/ops alarms list alarms.
In the picture above, there are three alarms, the one we just created is the CPU one.
Let’s now make the instance consume some CPU. For that I will ssh to the instance and install and run stress tool.
$ ssh email@example.com
$ sudo yum install stress -y
$ stress --cpu 2
After some time, you should receive an alarm just like the one I got. You can see that Opsidian also attaches a plot related to the metric. You can zoom into the metric by clicking on the image..
So now we have been notified that there is a problem, somebody should fix it — but how? A great feature in Opisidan is that you can add simple notes to alarms! This way you can pass information on how to fix a particular alarm or provide a link to a FAQ, Wiki, etc, where you have documented the most common solution. Let’s simulate this:
/ops alarms notes alarm awsec2-i-31bc50a7-CPU-Utilization add "You should kill the stress tool"
Now, when the alarm comes, it will have all the notes shown in the bottom of the Slack message.
In this post I have shown you how Opsidian handles CloudWatch alarms. I think this feature of Opisidian is especially cool for startups, and small teams, which can configure alarm notification in a fast and easy way. What is even more important, the notifications will come to the place where the team is spending 80% of time — Slack. So, if do not need 24/7 reaction times, you do not have money for massive alarm escalation services, Opsidian alarms may just be the right choice for you, besides I know that integrations to external services are coming soon.
PS: If you would like to try Opsidian, feel free to install it. Also if you have any questions, join Opsidian’s Slack room and talk to the creators: http://opsidian.ai/slack.