First of all I need to say that this article will show only one, from many ways how to get web server metrics. The main goal of this way is a quick deployment and configuration, using well-known open source projects like Grafana and Telegraf. So you can have a really quick start for getting maximum of available web server metrics, even including some application performance information. Of course this way will be not a replacement for ELK stack, but it can give a huge amount of analytics in the short time.

After we complete you will have a Grafana with few dashboards:

1. System metrics dashboard (like CPU, Memory, Disk usage and so on)

2. Nginx metrics dashboard, with server activity statistic, some most important HTTP codes, requests time and more.

3. Dashboard with Nginx logs information and nice visualized GEO statistic for the incoming requests.

4. SystemD services status dashboard, where you can check your important services state.

You may create an alerting dashboard as well, based on the most important metrics.

This example was taken from my weekend experience of building some Nginx monitoring for the web project of my friends. Actually, you can look on it like on first brick in the monitoring wall, for you web server.

First you need to create a Grafana+InfluxDB bunch for storing metrics and drawing our dashboards, use some external VPS for it. Also, you can use my previous guide for quick setup of Grafana+InfluxDB stack with Docker.

After we’ll have configured InfluxDB+Grafana, we ready to continue. Next make install Telegraf on your web server and few my python scripts. This scripts we need for additional metrics, like SystemD services status and GEO data for the incoming requests.

You can install Telegraf as package or compile the latest version and just copy it to the remote server.

OK, from this point we must have installed Grafana with InfluxDB on monitoring server and Telegraf on our source Nginx server, so let’s move on.

As you may know Nginx provide a build-in “status module”, for getting metrics about web server activity, it’s not so powerful like in Nginx Plus, but we can add more information by parsing Nginx log files, using Telegraf plugin and external scripts.

First make sure that your Nginx compiled with status module support:

nginx -V 2>&1 | grep -o with-http_stub_status_module

Then create a new configuration file, to enable locally accessible statistic page with metrics, from status module:

vi /etc/nginx/conf.d/status.conf

Then put configuration parameters in there:

server { 

listen 127.0.0.1:9090;
location /nginx_status {
stub_status on;

access_log off;
allow 127.0.0.1;
deny all;
}
}

Reload Nginx:

nginx -s reload

If everything OK, you can check local metrics by making curl on 127.0.0.1:9090/nginx_status address:

curl 127.0.0.1:9090/nginx_status 

Active connections: 9
server accepts handled requests
105060 105060 412116
Reading: 0 Writing: 1 Waiting: 8

In additions for the web server status metrics, we also will change the Nginx standard log format, for getting more interesting details like $request_time and $upstream_response_time (total time taken by nginx and any upstream servers to process a request and send a response). Also, we’ll use Nginx http_geoip_module to get GEO data about incoming IP’s in to the log file.

Check that your Nginx also compiled with GEOIP module:

nginx -V 2>&1 | grep -o with-http_geoip_module

If everything is OK, change the logging section in the nginx.conf and add new custom log format:

##
# Logging Settings
##
# Enabling request time and GEO codes
log_format custom '$remote_addr - $remote_user [$time_local]'
'"$request" $status $body_bytes_sent'
'"$http_referer" "$http_user_agent"'
'"$request_time" "$upstream_connect_time"'
'"$geoip_city" "$geoip_city_country_code"';
access_log /var/log/nginx/access.log custom;
error_log /var/log/nginx/error.log;

Download latest GeoIP.dat and GeoLiteCity.dat files from MaxMind and put them in to /etc/nginx/geoip for example:

mkdir /etc/nginx/geoip
cd /etc/nginx/geoip
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
gunzip GeoIP.dat.gz
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
gunzip GeoLiteCity.dat.gz

After that, you need to add a few strings in to the http {} section of the nginx.conf, to enabling GEO data:

# Add GEO IP support 
geoip_country /etc/nginx/geoip/GeoIP.dat; # the country IP data
geoip_city /etc/nginx/geoip/GeoLiteCity.dat; # the city IP data

Reload Nginx again and you’ll get log records like this:

37.xx.xxx.215 - - [19/Oct/2018:00:25:08 +0300]"GET / HTTP/1.1" 200 777"https://www.google.com/" "Mozilla/5.0 (Linux; Android 6.0; M5c Build/MRA58K; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/44.0.2403.146 Mobile Safari/537.36""0.003" "0.000""Vinnitsa" "UA"
54.xx.xxx.137 - - [19/Oct/2018:00:26:00 +0300]"GET /robots.txt HTTP/1.1" 404 14260"-" "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)""0.003" "0.000""-" "FR"

You can see a four additional fields ($request_time, $upstream_connect_time, $geoip_city, $geoip_city_country_code) that we specified previously. Mind that if you need any additional logging parameters you can refer into Nginx documentation and add them.

We just completed with Nginx, next will configure Telegraf and install few python scripts.

If you want to get an SystemD services statuses with their uptime (Nginx, Redis and other) you can clone and install my small python script from the repository:

This script runs by telegraf exec plugin and send data about situation with specified SystemD services in json format to the InfluxDB so you can build the dashboards like this:

In Telegraf we need to add a Nginx inputs plugin, specify a right RegExp for logparser plugin, to satisfy new web server log format. Also configure an exec plugin to run my “srvstatus” script.

Add this to your telegraf config:

[[inputs.nginx]]   urls = ["http://127.0.0.1:9090/nginx_status"]
response_timeout = "5s"
[[inputs.logparser]] files = ["/var/log/nginx/access.log"]
from_beginning = true
name_override = "nginx_access_log"

[inputs.logparser.grok]
patterns = ["%{CUSTOM_LOG_FORMAT}"]
custom_patterns = '''
CUSTOM_LOG_FORMAT %{CLIENT:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \[%{HTTPDATE:ts:ts-httpd}\]"(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-)%{QS:referrer} %{QS:agent}%{QS:request_time} %{QS:upstream_connect_time}%{QS:geoip_city} %{QS:country_code}
'''
[[inputs.exec]] commands = [
"/opt/srvstatus/venv/bin/python /opt/srvstatus/service.py"
]
timeout = "5s"
name_override = "services_stats"
data_format = "json"
tag_keys = [
"service"
]

Pay attention to the spaces in the logparser custom pattern. I have spent some time to figure out why this plugin does not parse new Nginx log format, before I found few missed space.

Also, I will recommend to enable [[inputs.net]] and [[inputs.netstat]] plugins in telegraf config as well, to get networking metrics on system dashboard.

You can check are telegraf is OK after reconfiguration, by running this command:

telegraf --test

Don’t forget to restart it:

systemctl restart telegraf

For creating nice dashboards with GEO data statistic, about incoming IP’s, you can also use my “GeoStat” script. Full information about how to install and configure it placed on the GitHub as well.

At this point we sending all needed metrics and it’s time to the final step, login in to Grafana and create new dashboards for visualizing this metrics.

For system metrics you can use this nice dashboard:

Also you can use dashboards templates, written by me, for Nginx metrics and Nginx logs information with GEO map. You can find them on Grafana website as well:

Or you can create dashboards yourself, if you wish.

After finishing, you will have nice dashboards with Nginx and System metrics, parsed web server logs and GEO data. This will help you, with more deeper understanding of your web project.

Some example of how this can look in Grafana.

That’s all for now, hope it would be useful.

Good luck.

Join our community Slack and read our weekly Faun topics ⬇

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts

Alexey Nizhegolenko

Written by

Sysadmin & DevOps Engineer in Internetvikings, Stockholm.

Faun

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade