Docker Swarm, SpringBoot, Consul… Everything

@dusansusic
3 min readAug 25, 2016

--

So, I have next setup where is the point to High Availability system which is running on cloud:

I provisioned 3 Consul servers and put them to work in cluster mode.
Addresses are:

consul1 172.100.0.11
consul2 172.100.0.12
consul3 172.100.0.13

docker run -d — net=host \
-p 8300:8300 \
-p 8301:8301 \
-p 8302:8302 \
-p 8400:8400 \
-p 8500:8500 \
-p 53:8600/udp \
— name consul \
-e ‘CONSUL_LOCAL_CONFIG={“skip_leave_on_interrupt”: true}’ \
consul agent -server \
-client=0.0.0.0 \
-bind=172.100.0.11 \
-ui -bootstrap-expect=3 \
-retry-join=172.100.0.11 172.100.0.12 172.100.0.13

The same approach is used for other Consul instances, just addresses are changed to 172.100.0.12 & 172.100.0.13, retrospectively.

Then, I provisioned 3 Docker Swarm managers and additional 3 Docker Swarm workers.
Each of them is running Consul agent which is pointing to Consul Cluster.

Managers:

swman1 172.100.0.21
swman2 172.100.0.22
swman3 172.100.0.23

Each Docker manager instance is running Consul in agent modes

docker run -d — net=host \
-p 8300:8300 \
-p 8301:8301 \
-p 8302:8302 \
-p 8400:8400 \
-p 8500:8500 \
-p 8600:53/udp \
— name consul \
-e ‘CONSUL_LOCAL_CONFIG={“skip_leave_on_interrupt”: true}’ \
consul agent \
-client=0.0.0.0 \
-bind=172.100.0.21 \
-ui \
-retry-join=172.100.0.11 172.100.0.12 172.100.0.13

just -bind is changed to 172.100.0.22 & 172.100.0.23, retrospectively.

Docker swarm is started on each machine after Consul agents:

docker run -d — name swarm -p 4000:4000 swarm manage -H :4000 — replication — advertise 172.100.0.21:4000 consul://172.100.0.21:8500

docker run -d — name swarm -p 4000:4000 swarm manage -H :4000 — replication — advertise 172.100.0.22:4000 consul://172.100.0.22:8500

docker run -d — name swarm -p 4000:4000 swarm manage -H :4000 — replication — advertise 172.100.0.23:4000 consul://172.100.0.23:8500

Workers:
Same approach as for Workers, no difference on Consul side.

swnode1 172.100.0.31
swnode2 172.100.0.32
swnode3 172.100.0.33

docker run -d — name swarm swarm join — advertise=172.100.0.31:2375 consul://172.100.0.31:8500

docker run -d — name swarm swarm join — advertise=172.100.0.32:2375 consul://172.100.0.32:8500

docker run -d — name swarm swarm join — advertise=172.100.0.33:2375 consul://172.100.0.33:8500

Here are some pics:

Pic1

Pic2

Pic3

I initialized Docker swarm with:

docker swarm init

and I got tokens to join managers and workers:

First thing I don’t understand why the hell managers are workers also? when I execute docker info command, I am getting response:

Managers:3

Nodes:6

I thought that services I am executing are executed only on workers, not on managers, too.

Then I created overlay network:

docker network create — driver overlay — subnet 10.0.50.0/24 lol

Microservices that I am running are SpringBoot apps. In configuration, I have dependencies to Consul and also annotated Consul because my microservices are communicating with each other.

Entry point to whole system is Zuul Gateway with path mapping to the other services, so I need to specify Consul agent endpoint in bootstrap.yml and this is a place where my problem begins.

Since I am already using Consul agent on each host, I thought to use it for service discovery and in bootstrap.yml put docker0 address, so 172.17.0.1.

Then, I build service and build Docker container, push it to registry, pull it from registry and start it with:

docker service create — name gateway — replicas=6 xxxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/gateway:latest

After that, I see this:

ubuntu@manager1:~$ docker service ls
ID NAME REPLICAS IMAGE COMMAND
a0rnz3cryqk8 gateway 6/6 xxxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/gateway

So all servises are up and running! Nice!

Actually, I have problem: Health checks are failing on all running nodes and all of them are reporting similar error:

Get http://0c0d3b6141b3:8080/health: dial tcp: lookup 0c0d3b6141b3 on 172.100.0.2:53: no such host

When I execute from any container telnet 172.17.0.1 8500, I am getting no response. I cannot even reach port 8500. How then my service is appearing in a list in Consul at all?

So I know that problem is that Docker container name is not resolved but I don’t know how to get rid of this. I tried dnsmasq and DNS forwarding without success (used article from this site, too).

So, how microservice packed in container that is running in a overlay network can communicate with Consul agent which is running on top of host network? Is that possible at all?

Is there any better solution to accomplish this? It’s very important that service is replicated to multiple hosts and discovered by Consul.

Thanks to all.

Pic4

Pic5

--

--