Gotta monitor'em all! — Part 2
The 2nd part of an adventure about monitoring with Zabbix
The first part of these series can be found here.
In the end of the first part, I've showed what was my plan. It was basically:
- Create a Host Group for every application or system;
- Update the Zabbix Agent using SaltStack;
- Verify what services to monitor on that specific server;
- And create Templates, Triggers and Actions for that specific application/system.
As I’ve mentioned before, the Zabbix documentation is a must read for newbies or people interested on implementing it and the whole process is very straightforward, as soon as you understand what is a Template, what it does and how can you work with Triggers and Actions it becomes very simple to get along, that’s why this post is not a tutorial.
OK Bruno, so how did you do it?
To help me map what to monitor, I've used the folder structure we created on vCenter, for example:
[test] <- Folder on VCenter / Env
java-app <- Host Group
|-- server-1 <- Host
web <- Host Group
|-- web-server-1 <- Host
After that planning, it was time to update or install the Zabbix Agent on the servers that didn’t have it installed. I made it through SaltStack and you can find the shell script and a salt state I’ve created on my GitHub page. It was made to run on CentOS/RHEL servers, so feel free to clone the repo and adjust to your needs.
If you’d like to use the shell script, you can run the command on your salt master:
salt -N <server_group> cmd.run ‘curl -s http://web01/zabbix-update/update.sh | bash’
And what now?
With host groups and hosts created and running the latest agent version, I've started creating some templates and that's when I've found about Low-Level Discovery! As Zabbix documentation states:
Low-level discovery provides a way to automatically create items, triggers, and graphs for different entities on a computer. For instance, Zabbix can automatically start monitoring file systems or network interfaces on your machine, without the need to create items for each file system or network interface manually. Additionally it is possible to configure Zabbix to remove unneeded entities automatically based on actual results of periodically performed discovery.
And I've used it A LOT. Really.
To give a brief example: The data/analytics team use an application that runs on Windows and has brackets in its service name. Yeah…
As Zabbix only supports
0–9a-zA-Z_. I figured out a way to monitor using LLD and it was quite easy, because it supports service discovery for Windows hosts. To monitor these goddamn services, its necessary to create a Regular Expression and use it as a Filter in Discovery Rule.
Any more perks?
Yes! For example, it's possible to create Triggers based on a response time for a Web Page or the whole login process of a Web App and depending on the time it took to execute or response code it returned, you could use an Action to run a remote command on the host. So you can have a script on your web servers to restart your application pool or something like that.
And does it come with any Jedi mind tricks?
Unfortunately it does not. Zabbix is a great tool for companies (of any size) which doesn't have a reliable monitoring tool, but if you're already on some cloud-based infrastructure or migrating to OpenStack or AWS, Azure or GCP, I believe the best approach is to use DataDog or NewRelic.
At my current work, I've integrated Zabbix with Grafana, which is an awesome tool to create dashboards and much more simple and pretty than Zabbix Screens.
To be continued…
This post is part of a series called Gotta monitor’em all!. If you liked this post, hit the little ❤️ or the follow button to know when the next post is up.