Consul Service Discovery. Part 2

In the first part we considered in depth what problems and tasks can be solved by application distributed architecture. We defined what tools can be used for solving these problems and marked the importance of discovery implementation at the setting stage of the project. We also chose Consul as a base application for discovery — service implementation.

In the second part we will review Consul’s work with DNS protocol, describe the main requests to HTTP API, clarify what types of Health Check can be used and, of course, find out the importance of K/V storage. And what is more important — we will become familiar with some of the features in practice.

DNS interface

In case several services exist in catalogue with same names and different IP-addresses, Consul accidentally mixes addresses in the response for implementation of DNS primitive load-balancing.

It is possible to make either a straight request for domain name resolution in terms of cluster or to make a lookup. It can be done for both service lookup and node lookup.

Domain name format for DNS request in terms of consul-cluster is strictly defined and is not a subject to change.

Cluster node

Let’s review node’s name format for DNS-request:

.node[.datacenter].
  • <node> — obligatory part, node’s name;
  • .node — indicator for making a node lookup;
  • [.datacenter] — optional part, datacenter name (consul “out of the box” can provide discovery for several datacenters in terms of one cluster. By default, “dc1” name is used. If the name of a datacenter is not included, the current datacenter is used. The current datacenter is a datacenter in terms of which there is a running agent that receives the request);
  • <domain> — obligatory part, private Consul top-level domain has the meaning.consul by default.

So, domain name for node (for example, by name nodeservice) will look like that:

nodeservice.node.consul.

As we can see the datacenter’s name is dropped, but the name can be built like that:

nodeservice.node.dc1.consul.

Several nodes with the same name in terms of one datacenter are not allowed.

Service

That’s how a typical request for nodes lookup with a running server (with the name rls) looks like:

root@511cdc9dd19b:~# dig @127.0.0.1 -p 8600 rls.service.consul.; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 rls.service.consul.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26143
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;rls.service.consul. IN A
;; ANSWER SECTION:
rls.service.consul. 0 IN A 172.17.0.2
rls.service.consul. 0 IN A 172.17.0.3
;; Query time: 4 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Feb 18 07:23:00 UTC 2016
;; MSG SIZE rcvd: 104

From this response we can see that there are two nodes with running service (rls) in cluster and that Consul DNS-interface returns IP-addresses of all nodes. If we repeat the request for several times, we will see the logs switching places. It means that the first place is not reserved to the first found server. This is an example of simple DNS load-balancing that we mentioned above.

If we request SRV-record, the response will be the following:

root@511cdc9dd19b:/# dig @127.0.0.1 -p 8600 rls.service.consul. SRV; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 rls.service.consul. SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8371
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;rls.service.consul. IN SRV
;; ANSWER SECTION:
rls.service.consul. 0 IN SRV 1 1 80 agent-two.node.dc1.consul.
rls.service.consul. 0 IN SRV 1 1 80 agent-one.node.dc1.consul.
;; ADDITIONAL SECTION:
agent-two.node.dc1.consul. 0 IN A 172.17.0.3
agent-one.node.dc1.consul. 0 IN A 172.17.0.2
;; Query time: 5 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Feb 18 07:39:22 UTC 2016
;; MSG SIZE rcvd: 244

In ANSWER SECTION there are domain names of nodes in format, requested by Consul (pay attention- these are nodes, not services!) and ports with running requested service. Nodes IP-addresses are listed in ADDITIONAL SECTION of the response.

Service name format for DNS-request looks like:

[tag.].service[.datacenter].
  • [tag.] — optional part. It is used for service filtration by tags. If we have services with the same name but different tags, then adding tag’s name can help to filter the respond;
  • <service> — obligatory part, service name;
  • .service — indicates that we make a service lookup;
  • [.datacenter] — optional part, datacenter name;
  • <domain> — obligatory part, Consul top-level private domain.

So, service with nginx name, possessing tag by name “web”, can be presented as domain:

web.nginx.service.consul

SRV-requests for services lookup according to RFC-2782

Besides the “usual” building of domain name we can build it according to more strict rules RFC-2782 (https://www.ietf.org/rfc/rfc2782.txt) for request execution on getting SRV-record. Name format looks like:

_service._tag.service[.datacenter].

Service name and tag have underscore (_) in the form of prefix. (In original RFC instead of tag should be protocol name. This helps to prevent collisions while making a request). In case of using name in RFC-2782 format, service with nginx name, possessing tag by name “web”, will look like that:

_web._nginx.service.consul

The response will be the same as in case of a “simple” request:

root@511cdc9dd19b:/# dig @127.0.0.1 -p 8600 _rls._rails.service.consul. SRV; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 _rls._rails.service.consul. SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26932
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;_rls._rails.service.consul. IN SRV
;; ANSWER SECTION:
_rls._rails.service.consul. 0 IN SRV 1 1 80 agent-one.node.dc1.consul.
_rls._rails.service.consul. 0 IN SRV 1 1 80 agent-two.node.dc1.consul.
;; ADDITIONAL SECTION:
agent-one.node.dc1.consul. 0 IN A 172.17.0.2
agent-two.node.dc1.consul. 0 IN A 172.17.0.3
;; Query time: 6 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Feb 18 07:52:59 UTC 2016
;; MSG SIZE rcvd: 268

By default, all domain names in terms of Consul have TTL = 0, meaning they are not cashed at all. It is important to remember.

HTTP API

https://www.consul.io/docs/agent/http.html, here is a brief description of them for having an idea of API opportunities:

  • acl — access control;
  • agent — Consul agent management;
  • catalog — cluster nodes and services management;
  • coordinate — network coordinates;
  • event — custom events;
  • health — availability check;
  • kv — Key/Value storage;
  • query — prepared requests;
  • session — sessions;
  • status — system status.

acl

agent

catalog

coordinate

event

health

kv

/v1/kv/

The difference in processing is in the request method. GET will return the value by key, PUT will save the new value or rewrite the old one, and DELETE will delete the record.

query

session

status

Health Checks

  • Script check — running of a certain script on a certain node at predetermined intervals. According to escape code (any different from zero code means that the check fails) it turns on or turns off the node or service.
  • HTTP Check — check, that is trying to get the stated URL, and according to response code it turns on or turns off the testing object (any 2xx — is ok, code 429 Too Many Requests generates a warning, other codes report an error).
  • TCP Check — check, that is trying to establish tcp-connection with a predetermined interval to a determined address and port. Connection failure means that the check fails.
  • TTL Check — check, that should be periodically updated by HTTP API. If a service doesn’t update this check in terms of a certain interval, it is indicated as non-working. This is a passive check. The service has to report periodically of its work. If in predetermined interval there is no report, the check is considered failed.
  • Docker Check — check for services working in docker-containers. Consul, by using Docker Exec API, can run a script, located inside a container. The check result will depend on escape code (any different from zero means the check failure).

K/V storage

Conclusion

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store