Docker Container and Host Monitoring, Logging (& Alerting) in a Box

Wilhelm Uschtrin
5 min readJul 31, 2016

Update (11.09.2016): This suite now also includes tooling for sending out alerts - scroll all the way down to read more about that.

A while ago I figured that if we want to be serious about setting up a high-performing and stable backend for our new shop, Autorenwelt would also need a proper monitoring and logging suite. It became clear pretty quickly, that we would need to get some light into that black box. At the same time it had to be running on Docker, because… well, because. Right?

Alas, over the course of two weeks of tinkering I put together the following monitoring/logging stack: uschtwill/docker_monitoring_logging_alerting. This github repository also contains a step-by-step guide on how to set it up.

For monitoring the stack employs: cAdvisor and node_exporter for collection, Prometheus for storage and Grafana for visualisation. Monitoring as in knowing how your hosts and containers are doing: Metrics for CPU usage, memory consumption, disk I/O, system load and so on.

And for logging it uses: Filebeat for log-collection and forwarding, Logstash for aggregation and processing, Elasticsearch as datastore/backend and Kibana as the frontend. Logging as in: Reports and logs of what crashed when where and why.

As for alerting: elastalert as a drop-in for Elastic.co’s Watcher for alerts triggered by certain container or host log events and Prometheus’ Alertmanager for alerts regarding metrics.

How does it work?

Dockerhost-Logging: Running a Filebeat container on a host suffices to forward all host logs to a centralized Logstash instance, which processes and forwards the logs to the Elasticsearch. Just copy the Filebeat service from the docker-compose.yml.

Container-Logging: Defining gelf as a logging driver for a container together with the logstash IP is enough for all logs that are usually going to stdout and stderr to be forwarded to Logstash. This is already put in place for the containers of the stack themselves, so you can just check the docker-compose.yml and use the same lines for any other container.

Dockerhost-Monitoring: Just like runnning a Filebeat container for logging, simply running a node_exporter container on the host is enough to expose all host metrics for scraping by Prometheus.

Container-Monitoring: cAdvisor aggregates metrics from all containers running on the host and exposes them for Prometheus in the same way node_exporter does for the host.

Labels, Dashboards, and Queries

Labels: As you can see in the options for the logging driver in the docker-compose.yml, the driver also forwards the container_group label, which is VERY handy for slicing and dicing your metrics and logs. Just label your containers into different groups like backend, frontend, maintenance, infrastructure or whatnot, and you’ll be able to much quicker grasp what is going on with your stack. Of course you can also define your own labels and forward them via the log driver. As for metrics, cAdvisor picks up and forwards any labels automatically.

Dashboards: The github repository comes with 4 dashboards, 2 for Kibana (logs) and 2 for Grafana (metrics). One each for monitoring and one for exploring. Just load import like it’s explained in the README.md and you’re good to go. You’ll notice that some of the dashboards already make use of the aforementioned labels.

Queries: Because building queries with the Grafana/Prometheus combo is not totally straightforward, I included some resources here.

Anyways, I found it a bit daunting in the beginning to get all the different pieces together. So it would really make me happy, if this repository would allow others to hit the ground running and have a working example of how the different components integrate and work with each other.

On the other hand this should really just be the beginning. Personally I am looking to soon add Watcher and Alertmanager to the stack. Other ideas are trying out InfluxDB as a metric storage or even switch to staying with Elastic altogether, even for monitoring and metrics, by giving Topbeat a go. But yeah, I think even the current stack should make a pretty good starting point for the field of “Docker Container and Host Monitoring & Logging”.

Try it out and have some fun with it! And let me know if you have any questions, comments or ideas for improvement. Either in the issues on Github or down there in the comments.

Cheers
Will

Update (11.09.2016):

Alerting and Annotations in Grafana

I just added elastalert and Alertmanager for alerting. Rules for logging alerts (elastalert) go into ./elastalert/rules/ and rules for monitoring alerts (Alertmanager) go into ./prometheus/rules/. Alertmanager only takes care of the communications part the monitoring alerts, the rules themselves are defined “in” Prometheus.

Both Alertmanager and elastalert can be configured to send their alerts to various outputs. In this suite, Logstash and Slack are set up. The integration with Logstash works out of the box, for adding Slack you will need to insert your webhook url.

The alerts that are sent to Logstash can be checked by looking at the ‘logstash-alerts’ index in Kibana. Apart from functioning as a first output, sending and storing the alerts to Elasticsearch via Logstash is also neat because it allows us to query them from Grafana and have them imported to its Dashboards as annotations.

The monitoring alerting rules, which are stored in the Prometheus directory, contain a fake alert that should be firing from the beginning and demonstrates the concept. Find it and comment it out to have some peace. Also, there should be logging alerts coming in soon as well, this suite by itself already consists of 10 containers, and something is always complaining. Of course you can also force things by breaking stuff yourself — the blanket_log-level_catch.yaml rule that’s already set up should catch it.

If you’re annoyed by non-events repeatedly triggering alerts, throw them in ./logstash/config/31-non-events.conf in order for logstash to silence them by overwritting their log_level upon import.

A word of caution: The standard configuration of the suite is for testing purposes only. As it is simply forwarding ports at the moment, if your box is accessible publicly, all your logs and metrics will be out in the open. Switch off port forwarding before using this in an “online” environment.

--

--