Troubleshooting a Server in 5 min. ChatOps way.

porokh.sergey
DevOops World … and the Universe
4 min readMay 16, 2016

Back in 2013 Vincent Viallet published a simple but nevertheless quite useful article “First 5 Minutes Troubleshooting A Server”. Although that information wasn’t new for Linux administrators it was rather helpful for different support teams to improve its operation. After my publication “StackStorm: DevOps to ChatOps” I received a couple of questions considering the use case of ChatOps for support teams. So this short article will be related to some basic server troubleshooting based on Vincent Viallet’s article using StackStorm as a ChatOps platform.

Just imagine a support guy who works with any helpdesk ticket system, and receives notification that some services stop working. What kind of information he could get from the bot based on the troubleshooting guide? Let me show you:

Who’s there and what was previously done?

To check it, I used core pack with remote_cmd and remote_sudo actions. Server 10.0.1.37 is my local web-server.

What is running?

Same as before, used core pack with remote_cmd action.

Listening services?

For Linux netstat I used linux pack, with netstat action. It is also possible to get this information via core pack using remote_cmd.

CPU and RAM

I used core pack with remote_cmd action, but for some cases you may use linux pack with check_loadavg action, as well as vmstat action.

Hardware

Same as before, running remote_cmd for getting some hardware information

IO Performances

To get information about IO performances you may use linux pack with vmstat, or remote_cmd to access iostat if it's installed on your system.

Mount points and filesystems

Using remote_cmd or remote_cmd_sudo will do the trick

Service status

The best way to restart a service or simply check its status is to use a linux pack with linux.service action. You may use remote_cmd_sudo as well but linux.service is more preferable. Here’s an example of cron service start (if it fails for some reason):

Summary

ChatOps as a tool is capable of doing everything that we used to do in DevOps with the common tools. Strictly speaking, if you have remote_cmd and remote_cmd_sudo actions on StackStorm, it's more than enough to do any operation on any server. The problem is that commands that need to be in interactive mode with TTY attached can’t be run in a simple way. But it’s possible to overcome it with “batch mode”, like I did for linux “top”.

As you see, the support team could have lots of information to escalate to the responsible person, if they still don’t fix the issues themselves. And moreover, the support team don’t have access to the server directly, and only operating with the bot to get all the information.

--

--