Docker Daemon Metrics in Prometheus

Note: This is based on a experimental feature, the naming and functionalities could change. In the future Prometheus will probably provide a native way of integrate it with a swarm cluster. Docker will add more metric values in the future.

From Docker 1.13 there’s a new endpoint that allows to expose internal metrics in Prometheus format.

It’s just an initial release of what in the future will offer so many metrics related with containers and services (in swarm mode), but, at this moment it’s just offering some metrics that are daemon related.

Activating metrics endpoint

At the moment of the writing the metrics endpoint is not activated by default, and is marked as experimental, and you’ll need to add a configuration value to the docker daemon.

metrics-addr=0.0.0.0:4999

This will publish the metrics endpoint, in the 4999 port of the host. If you are using docker-machine for create servers you’ll need to add these flags:

--engine-opt experimental --engine-opt metrics-addr=0.0.0.0:4999

As at time of this writing the 1.13 is still in RC stage if you want to create a server with this version you’ll to specify it in docker-machine command. A working example would be:

docker-machine create \
--driver virtualbox \
--engine-opt experimental \
--engine-opt metrics-addr=0.0.0.0:4999 \
test-metrics

Now you can get the metrics from your shell:

curl http://$(docker-machine ip test-metrics):4999/metrics
# HELP engine_daemon_container_actions_seconds The number of seconds it takes to process each container action
# TYPE engine_daemon_container_actions_seconds histogram
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.005"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.01"} 1
engine_daemon_container_actions_seconds_bucket{action="changes",le="0.025"} 1
...

Cool! Isn’t it? Mmm, maybe it’s not very useful to have this data just to read it from command line.

Creating a swarm cluster for collecting daemon metrics

As you have seen it’s easy to activate the metrics but for make them usable we’ll need to make prometheus aware of them.

Another point here is that if you have a cluster probably you’ll want to get the metrics of all the hosts that compose the cluster. In case you have a swarm cluster there’s a basic technique that can be used here. Have you ever heard about “global services” in a swarm cluster?

Global services are services that have one running task on each cluster node, and if you add a new node to the cluster the swarm scheduler will take care of launch a new task of each global service on it.

On other hand swarm provides a very simple but powerful service discovery service. It’s DNS based and when you launch a service in a overlay network each task can be discovered by any other service that lives in the same network. We’ll see it working soon, but let’s recap before going on.

I’ve said “cluster” and “global services”, but we have created just one host to show the metrics from there. Let’s create two more hosts and add a global service to expose the docker metrics!

docker-machine create \
--driver virtualbox \
--engine-opt experimental \
--engine-opt metrics-addr=0.0.0.0:4999 \
test-metrics-2 &

docker-machine create \
--driver virtualbox \
--engine-opt experimental \
--engine-opt metrics-addr=0.0.0.0:4999 \
test-metrics-3 &

wait

This creates in parallel 2 more servers.

Once finished we have 3 nodes, the next step will be make them work as a swarm cluster, for doing it I’ll use the first node as the manager (and seed) of the cluster, the other two nodes will work as workers.

eval $(docker-machine env test-metrics)
docker swarm init --advertise-addr $(docker-machine ip test-metrics):2377

Swarm initialized: current node (90z5bvvt0di221bxsfgec4nao) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join \
--token SWMTKN-1-42wp8coliqwvxfe5vxkveq85yiif2968he89qjfnn8cyx7isog-56gd1hw4fvrx005gdon16pzmt \
192.168.99.100:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

In you case you’ll need to execute the same commands but copy your generated token, don’t use the token generated in my example, it won’t work!

eval $(docker-machine env test-metrics-2)
docker swarm join \
--token SWMTKN-1-42wp8coliqwvxfe5vxkveq85yiif2968he89qjfnn8cyx7isog-56gd1hw4fvrx005gdon16pzmt \
192.168.99.100:2377

This node joined a swarm as a worker.
eval $(docker-machine env test-metrics-3)
docker swarm join \
--token SWMTKN-1-42wp8coliqwvxfe5vxkveq85yiif2968he89qjfnn8cyx7isog-56gd1hw4fvrx005gdon16pzmt \
192.168.99.100:2377

This node joined a swarm as a worker.
eval $(docker-machine env test-metrics)

We have created a swarm cluster composed by one manager and two workers. This is not recommendable for production usage but for this example is enough.

Where are the global service and Prometheus?

As you have seen by now we have 3 nodes with the metrics endpoint, and the 3 nodes are part of a swarm cluster.

Thanks to the swarm global services we could expose and collect all the metrics of all the nodes. But how could we do it? I’ve created a very simple image that does it for you. It just exposes the metrics endpoint from the host to the cluster, using the “socat” command is very easy.

First we create an overlay network, as I said before thanks to this we can discover tasks running from different services.

docker \
network create --driver overlay monitoring

Now let’s create the service which exposes the metrics.

docker \
service create \
--mode global \
--name docker-exporter \
--network monitoring \
--publish 4999 \
-e IN=172.18.0.1:4999 \
basi/socat:v0.1.0

The naming here is important because it’s how it will be identified by Prometheus. By default the image searches the source in 172.18.0.1:4999, but it that’s not your case you can change its default value by yours.

Now let’s launch the Prometheus service that will discover the docker metrics, I’ve prepared an image for this purpose. It does more things but I won’t say anything about them here because is out of the scope of this post.

docker \
service create \
--name prometheus \
--network monitoring \
--publish 9090:9090 \
basi/prometheus-swarm:v0.4.0

Once the service is launched if you open any 9090 port of a cluster host you’ll see Prometheus collecting the daemon metrics! Voilà!

open http://$(docker-machine ip test-metrics):9090
Targets are automatically discovered

Thanks to the DNS auto-discovery it starts collecting the metrics, the way it’s defined in the prometheus configuration file is this block:

- job_name: 'docker-exporter'
dns_sd_configs:
- names:
- 'tasks.docker-exporter'
type: 'A'
port: 4999

As you may see the DNS name that it tries to find is “tasks.docker-exporter”, that will provide to prometheus the IPs associated to each service task. This can be seen from the container doing a DNS request, in my example done with:

docker exec -it 4b987cc63d7a /bin/sh -c "nslookup tasks.docker-exporter"
Server: 127.0.0.11
Address 1: 127.0.0.11

Name: tasks.docker-exporter
Address 1: 10.0.0.3 docker-exporter.k5y6olk84zyfgfncdot6i3vpx.x6o8979903vcdysro429fmi4y.monitoring
Address 2: 10.0.0.4 docker-exporter.90z5bvvt0di221bxsfgec4nao.k4mgelr9ogeuo8wm2007byfvs.monitoring
Address 3: 10.0.0.5 docker-exporter.p971krglwxc4gv5xda88ev2ib.alefkdedhuy5d2ga606z6mven.monitoring

Exploring some data

Now that we have the metrics available in Prometheus let’s see what we can do.

Some data obtained from the docker-exporter
Data obtained by Prometheus thanks to the docker-exporter

If you launch new services in the cluster or scale any you’ll see how these values change, and the basic graph capabilities of Prometheus can be useful when trying to see something.

Data collected and graphed in Prometheus
Another example of what we can do with Prometheus

I still need to know what all the provided metrics mean, I’ve not found documentation about them, but they are self-descriptive in some way. As you may see Prometheus is very good scraping and saving the data, and it provides an amazing query language, but probably is not the best tool to do some complex graphs, fortunately it can be used a a source for other tools like Prodash or Grafana.

As I have used previously Grafana and I like it very much for doing these tasks I have created a service based on it. If you remember we defined all the services in a “monitoring” network, I have used the same network for the Grafana service, it allows me to connect it to Prometheus using its service name.

docker \
service create \
--name grafana \
--network monitoring \
--publish 3000:3000 \
-e "GF_SECURITY_ADMIN_PASSWORD=admin" \
grafana/grafana:4.1.1

After some seconds you can open Grafana:

open http://$(docker-machine ip test-metrics):3000

If you know something about Grafana you won’t have any problem creating the datasource from Prometheus.

Adding the Prometheus service as data source.

To save some time I’ve created an initial version of a grafana dashboard that you can import to see some basic graphs: https://grafana.net/dashboards/1229

Some metrics
More metrics included in the dashboard

Take it as an initial approach to the docker metrics, for the future there’s a roadmap defined in this task.