Monitor Multi-vendor cloud VMs using NAGIOS

Chalbi Mohamed Amine
Nov 3 · 8 min read
Photo by Matthieu A on Unsplash

Being able to monitor your assets is a must in the world of IT, it can mean the difference between having a good client experience and having a service outage and it can also mean the difference between having a secure working infrastructure and being attacked by hackers and losing sensitive information. Monitoring and supervising can be implemented using different approaches and different tools and frameworks. My personal preference is the combination of Prometheus and Grafana but today I will be speaking about Nagios which is also a cool and powerful tool to master.

What is Nagios?

Nagios is a free and open-source software for system and network monitoring. It can collect a wide range of information and metrics about different systems ranging from web servers to routers and switches.

Today we will be simplistically using Nagios, running it on one webserver to collect and send data to another centralized server that we will use for the monitoring. In the next post, we will be expanding our use of Nagios to monitor network equipment.

Our architecture

This is the simplest useful architecture that we can start with. We want to monitor our web server at all times and receive alerts whenever things get out of hands. For example, if the CPU utilization gets above a certain threshold we might need to add another server to load balance the traffic and make sure that our website doesn’t suffer an outage.

Time to get our hands dirty

I will be following the official install guide provided by Nagios here and using Ubuntu as my OS, you can use any Linux distro you want that is supported by Nagios.

Prerequisites VMs

First of all, you need to set up and configure two VMs, one running on AWS to serve as our centralized monitoring station and the other one on MS Azure to run our web server and serve as our target. If you are completely new to the world of cloud computing you might want to start by looking here to learn how to create a virtual machine on AWS as for azure It’s actually not that different but I would recommend taking a look at the following documentation.

If you set up everything up and continue with the installation of Nagios chances are that Nagios won’t work as expected. This is in fact that you will need to configure the networking on both Azure and AWS to allow the traffic generated by the machines to cross in both ways:

  • Make sure that your machines have public IP addresses and internet connectivity.
  • Make sure that you allow inbound traffic on ports: 80,22,161–162(SNMP protocol), 5666 (the port used by NRPE), and finally ICMP traffic for echo reply to allow ping testing.
  • To make sure that you are not exposing yourself to external attacks it’s a good idea to bind these rules to a specific IP address: On the azure side only allow traffic from your AWS server and Vice Versa.

If you encounter any problems while configuring your virtual machines feel free to ask me.

Nagios installation

First, we will need to install the perquisites:

sudo apt-get update
sudo apt-get install -y autoconf gcc libc6 make wget unzip apache2 php libapache2-mod-php7.2 libgd-dev

Now we need to download the source files and extract them:

cd /tmp
wget -O nagioscore.tar.gz https://github.com/NagiosEnterprises/nagioscore/archive/nagios-4.4.5.tar.gz
tar xzf nagioscore.tar.gz

After the extraction is done we will need to run the configuration script that comes with Nagios:

cd /tmp/nagioscore-nagios-4.4.5/
sudo ./configure --with-httpd-conf=/etc/apache2/sites-enabled
sudo make all

You should see something similar to this:

Now we will compile the source code using the make command:

make all

When the compiler finishes its work you should see the following message:

Great! the next step is to create a Nagios user and a Nagios user group to add apache into it, this command will require root privileges:

sudo make install-groups-users
sudo usermod -a -G nagios www-data

Now we will install the binaries:

sudo make install

Install the services and daemons:

sudo make install-daemoninit

Now we will configure the external commands mode:

sudo make install-commandmode

To start Nagios we will need a minimal configuration, so we will be using a sample configuration:

sudo make install-config

Next, we will need to configure Apache2

sudo make install-webconf
sudo a2enmod rewrite
sudo a2enmod cgi

Since Apache will be using TCP port 80 to send traffic we need to add a rule to our firewall:

sudo ufw allow Apache
sudo ufw reload

Now we will need to create a user name and password for the login into the dashboard when you execute the following command you will be asked to provide a new password and retype it:

sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Finally to make sure that all the configuration changes are taken into account we need to restart the Apache2 service by running the following command:

sudo systemctl restart apache2.service

After that, we will need to start the Nagios core service:

sudo systemctl start nagios.service

Now we can finally test if everything is working by typing the IP address of our ec2 instance followed by a “/Nagios”:

http://IP/nagios

You should be able to see the login page, enter your credentials and hit enter and the welcome page will be displayed:

Now we have our server ready and need to add a host to monitor. I will be adding one host which is a webserver running in AWS.

Host configuration

Now we will be configuring our host to communicate with Nagios and send the metrics that we are interested in.

Run the following command to make sure that everything is up-to-date:

sudo apt-get update

Now we can install NRPE(short for Nagios Remote Plugin Executor) which allows us to execute Nagios plugins on remote hosts.

sudo apt-get install nagios-nrpe-server nagios-plugins

After the installation is complete we need to add the IP address of our Nagios server to the configuration file:

nano /etc/nagios/nrpe.cfg

Now we need to restart the service so the changes can take effect:

sudo service nagios-nrpe-server restart

Now let’s go back to our server machine and modify the configuration so take into account our new server.

sudo mkdir /usr/local/nagios/etc/servers/
sudo nano /usr/local/nagios/etc/servers/ubuntu_host.cfg

Now post the following configuration and replace the address with your host IP address:

# Ubuntu Host configuration filedefine host {
use linux-server
host_name ubuntu_host
alias Ubuntu Host
address Your-host-ip-address-here
register 1
}
define service {
host_name ubuntu_host
service_description PING
check_command check_ping!100.0,20%!500.0,60%
max_check_attempts 2
check_interval 2
retry_interval 2
check_period 24x7
check_freshness 1
contact_groups admins
notification_interval 2
notification_period 24x7
notifications_enabled 1
register 1
}
define service {
host_name ubuntu_host
service_description Check Users
check_command check_local_users!20!50
max_check_attempts 2
check_interval 2
retry_interval 2
check_period 24x7
check_freshness 1
contact_groups admins
notification_interval 2
notification_period 24x7
notifications_enabled 1
register 1
}
define service {
host_name ubuntu_host
service_description Local Disk
check_command check_local_disk!20%!10%!/
max_check_attempts 2
check_interval 2
retry_interval 2
check_period 24x7
check_freshness 1
contact_groups admins
notification_interval 2
notification_period 24x7
notifications_enabled 1
register 1
}
define service {
host_name ubuntu_host
service_description Check SSH
check_command check_ssh
max_check_attempts 2
check_interval 2
retry_interval 2
check_period 24x7
check_freshness 1
contact_groups admins
notification_interval 2
notification_period 24x7
notifications_enabled 1
register 1
}
define service {
host_name ubuntu_host
service_description Total Process
check_command check_local_procs!250!400!RSZDT
max_check_attempts 2
check_interval 2
retry_interval 2
check_period 24x7
check_freshness 1
contact_groups admins
notification_interval 2
notification_period 24x7
notifications_enabled 1
register 1
}

Now we can check the configuration by using the following command:

sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

It should be displaying 2 hosts and not just the local server itself! In fact, Nagios hasn’t taken our servers configuration file into account yet, for that to happen we need to modify the Nagios configuration file to specify the servers folder path. use the following command to edit the configuration file

sudo nano /usr/local/nagios/etc/nagios.cfg

Look for the following line and uncomment it and add the path of the servers directory.

Now let’s run the above command once again to verify our configuration.

Great! we have both hosts detected.

Let’s reboot all services on both the server and the host:

So on the host machine, we will restart the NREP service:

sudo service nagios-nrpe-server restart

On the server machine we will restart the Nagios and the apache services:

sudo service apache2 restart
sudo service nagios restart

When you log in to the Nagios dashboard you will see that your Hosts are all down! that’s because we didn’t install the Nagios plugins yet. So let’s do that.

Nagios plugins

First, we will install the required packages:

sudo apt-get install -y autoconf gcc libc6 libmcrypt-dev make libssl-dev wget bc gawk dc build-essential snmp libnet-snmp-perl gettext

Next, we will download the source code and extract the files

cd /tmp
wget --no-check-certificate -O nagios-plugins.tar.gz https://github.com/nagios-plugins/nagios-plugins/archive/release-2.2.1.tar.gz
tar zxf nagios-plugins.tar.gz

Now we can compile and install Nagios plugins:

cd /tmp/nagios-plugins-release-2.2.1/
sudo ./tools/setup
sudo ./configure
sudo make
sudo make install

Finally, we need to restart all services and log in again to our Nagios server to check if we have any errors or problems.

We have both our Nagios central server and our Azure web server showing and being labeled as working.

If you click on “Ubuntu_host” you can get more detailed information and statistics:

We have reached the end of the road for this post and we have a working monitoring system in place. The next step would be to set up contact information and alerts so we can have automated alerts sent to our email in case of any anomalies or errors.

Chalbi Mohamed Amine

Written by

An Ex-Medical student turned computer science & engineering student with a passion for all things complicated and weird !

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade