Consul-Prometheus Monitoring Service Discovery

oguzozan
Trendyol Tech
Published in
4 min readMay 6, 2020

Prometheus is one of the most popular monitoring tools. It stores the collected metrics in the time-series database. In Trendyol, we use prometheus for monitoring lots of services such as kubernetes, elasticsearch, couchbase, postgresql, kafka, rabbitmq.

Exporters

Most services expose “/metrics” endpoint for prometheus scrape jobs. For other services there are third-party exporters to collect metrics. Thus we can monitor our services end to end, by using prometheus. You can check the supported exporters the link below.

Configuration

In Prometheus server, we need to specify targets(metric urls) in config file: “/etc/prometheus/prometheus.yaml ”
If you have an architecture consisting of thousands of servers and services, and this structure is constantly changing, you will have a huge configuration management challenge to monitor them. At this point, we face the need for dynamically changing configuration and automation.

After lots of researches and comparisons to overcome this challenge, we decided to use Consul Service Discovery to automate our monitoring systems.

You can check the detailed configuration document in the link below.

For example, to monitor an Elasticsearch service we need to install a node-exporter which serves at 9100 port as metric url and we need to modify the prometheus config file like this:

- job_name: 'node'
metrics_path: /metrics
static_configs:
- targets: ['10.0.0.1:9100']
labels:
_app:'elasticsearch'
_service:'node-exporter'
_hostname:'Vm01'
_environment:'dev'
_cluster:'monitor-cls1'
_es_role:'master'
_team:'storefront'

In such use, we will need to update the targets when you add or remove each elasticsearch node. In fact, if you need to use a label structure as in the example, you may need to define a separate job for the whole elasticsearch node since each “_hostname” tag should be different.

Let’s do the same process by creating a node-exporter service on the Consul.

Consul Service Creation

You can register-deregister the services to your consul server with the consul agent you set up in the VMs or http api provided by Hashicorp’s Catalog Service.

You can reach the detailed document from the link below.

If there is a consul agent on your servers, it will be enough to save a json file under “/etc/consul/consul.d/” path as below and reload the consul-agent service.

{
"Service": {
"Address": "10.0.0.1",
"ID": "Vm01_NodeExporter",
"Name": "node-exporter",
"Port": 9100,
"Tags": [
"_app=elasticsearch",
"_service=node-exporter",
"_hostname=Vm01",
"_environment=dev",
"_cluster=monitor-cls1",
"_es_role=master",
"_team=storefront"
]
}
}

As an additional option, you can remove the “Service” section and save it to your local computer with the json extension and create your service via http api provided by the consul catalog.

curl -X PUT — data-binary @node-exporter.json http://<consul-server-ip:8500>/v1/agent/service/register

While creating the consul service, we should define Consul tags according to our needs because we will export these Consul tags to Prometheus as labels. If we prepare Grafana dashboards using these labels, it will provide us a lot of flexibility.

- job_name: node-exporter
scrape_interval: 15s
honor_labels: true
consul_sd_configs:
- server: <consulserver-ip>:8500
services: [node-exporter]
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: ^.*,_team=storefront,.*$
action: keep
- source_labels: [__meta_consul_tags]
regex: .*,_app=([^,]+),.*
replacement: ${1}
target_label: _app
- source_labels: [__meta_consul_tags]
regex: .*,_service=([^,]+),.*
replacement: ${1}
target_label: _service
- source_labels: [__meta_consul_tags]
regex: .*,_hostname=([^,]+),.*
replacement: ${1}
target_label: _hostname
- source_labels: [__meta_consul_tags]
regex: .*,_environment=([^,]+),.*
replacement: ${1}
target_label: _environment
- source_labels: [__meta_consul_tags]
regex: .*,_cluster=([^,]+),.*
replacement: ${1}
target_label: _cluster
- source_labels: [__meta_consul_tags]
regex: .*,_es_role=([^,]+),.*
replacement: ${1}
target_label: _es_role
- source_labels: [__meta_consul_tags]
regex: .*,_team=([^,]+),.*
replacement: ${1}
target_label: _team

One of the biggest advantages of using Consul is that we can filter by tag in prometheus. Let’s assume that there is a “node-exporter” service as in our example, this service can includes hundreds of instances from different teams. The filter feature will enable us to distribute a consul service with hundreds of instances to different Prometheus Servers according to a tag such as “_team”.

Conclusion

As a result, with this structure, Prometheus can discover a service that we will create on the Consul and you can make the endpoints dynamic. Thus, when you create a VM, you can register it to consul according to the service it contains and automatically include it in your monitoring system. In addition, when you recycle this VM,it will be removed from the prometheus endpoints, due to it will be deregistered from the consul.

--

--