[Project 12] Docker Swarm — Docker Swarm is like conducting an orchestra!
Prologue
Previously, we have demonstrated how a single Docker container is controlled and behaved using docker run command. But this command is limited to control only one single container at a time. In reality, numerous clients may attempt to access to same service provided by a group of multiple containers. Managing each of multiple containers one at a time using docker run command is inefficient and increases the chance of adding human errors and mistakes. Orchestrating multiple Docker containers as an automation is necessary and Docker Swarm successfully fulfills the demand!
Docker Swarm
A Docker Swarm is a container orchestration tool running the Docker application. To manage and control a group of containers, multiple virtual machines join a cluster. The virtual machines in the cluster are called nodes — instances of the Docker engine participating in the swarm.
There are two roles among the nodes; a few nodes act as managers and the remaining nodes serve as workers. The mangers are to the workers in swarm as the conductor is to the muscians in orchestra! Let’s imagine that conductors guide the orchestra (a group of musicians playing their instruments) on when and how fast or slow using their right hand and loud or soft to play using their left hand shown below.
Similarly, the manager nodes participate in the Raft database. The database maintains a consistent internal state of the entire swarm and all the services running on it. When clients request a service, the manager nodes create tasks and decide which worker nodes will be assigned to the tasks. Once the selected worker nodes receive the tasks, they create and execute requested number of containers to fulfill the swarm service requeted by the clients.
During the process, one or more manger and worker nodes, which contain Docker engines, have been configured to join together as a cluster as shown below; it is Docker Swarm!
Due to the nature of the swarm structure, it offers high availability and fault tolerance; even if a few nodes become unavailable, external clients still can get access to the swarm service throughout the other available remaining nodes.
In Docker Swarm, we are going to deal with a group of containers by managing the swarm service rather than each single container. Accordingly, we are going to use ‘docker service create’ command in place of ‘docker run’ command.
Client’s service request
BASIC
Using AWS, create a Docker Swarm that consists of one manager and three worker nodes.
Verify the cluster is working by deploying the following tiered architecture:
· a service based on the Redis docker image with 4 replicas
· a service based on the Apache docker image with 10 replicas
· a service based on the Postgres docker image with 1 replica
ADVANCED
Create a Docker Stack using the Basic project requirements. Ensure no stacks run on the Manager (administrative) node.
COMPLEX
Build a Kubernetes cluster. Modify the Docker Stack so it runs on this cluster with no errors.
Let’s first tackle on the basic-level requests so that your team can immediately take advantage of swarm services!
Preparation
· An IAM user with full Administrative Access in a Free-Tier AWS account
· Visual Studio Code
Walkthrough
Let’s set up 4 virtual machines and spin up 4 AWS EC2 instances!
In EC2 dashboard, click Instances, Launch Templates and Create launch template.
Choose Amazon Linux 2023 as a base AMI image.
Select t2.micro as instance type.
Choose a key pair name from a list. Also, place the private pem key file to the directory where you execute SSH command in your Linux system. Always select Free tier eligible items as much as you can.
Let’s edit network settings below.
· Create security group
· Security group name: dockerswarm-sg
· Description: docker swarm
· VPC: select an available default VPC
In Advanced network configuration section, Enable Auto-assign public IP.
For Inbound security group rules, click the following button to add more security group rules.
Add all security group rules using the information showing port number, Protocols and function:
· 22: Secured Shell (SSH)
· 443: Secured HTTP
· 2377: TCP, manager communication in Swarm
· 7946: TCP/UDP, network node discovery in Swarm
· 4789: UDP, overlay network traffic in Swarm
· 80: HTTP, Apache webserver
· 6379: TCP, Redis database access
· 5432: TCP, PostgreSQL database
Reference: https://docs.docker.com/engine/swarm/swarm-tutorial/#open-protocols-and-ports-between-the-hosts
Designate source type as “Anywhere” and type as either “Custom TCP” or “Custom UDP” accordingly. There is no need to choose the source type for SSH, HTTP and HTTPS as they were already set by default!
You can also leave all other options and blanks by default as they are.
In Advanced details, go to User data section.
Copy and paste the following shell script into the section blank. It will automatically install and configure Docker program once a new virtual machine is up and running. Don’t forget to add the first line indicating the shell script interpreter, or the following commands in the scripts will not be executed upon running a new EC2 instance.
#!/bin/bash
yum update -y
yum install -y docker
usermod -a -G docker ec2-user
systemctl start docker.service
systemctl enable docker.service
systemctl status docker.service > ~/ec2launch.report
echo " Docker version: $(docker version)" >> ~/ec2launch.report
echo "Date of launching Virtual Machine: $(date)" >> ~/ec2launch.report
Click Create launch template.
In EC2 dashboard, go to Network and Security section to check all the security Groups used for the new EC2 launch template (See below).
Click Instances, Launch Templates, Click the arrow for Actions and choose Launch instance from template.
In Summary section, set a total of 4 as the number of instances, because we are using 1 instance for manager node and 3 instances for worker nodes. Then Click Launch instance.
Check on Instances in EC2 dashboard.
Then place appropriate name per each instance for 1 manager node and 3 worker nodes shown below.
Docker program was successfully installed and available. However, every Virtual machine was in the same availability zones. This could contradict to achieving high availability and tolerance by using Docker Swarm.
Let’s use Auto Scaling group to spin up 4 nodes (1 manager and 3 workers) and spread them over 4 different availability zones.
In EC dashboard, click Launch Templates. Then choose Create Auto Scaling gropu under Actions menu (See below).
Name Auto Scaling group, choose version 1 and click Next.
Select 4 random Availability zones and subnets to achieve high availability and fault tolerance.
Then click Next.
Click Next to skip Configure advanced options section.
For Group size, set all capacity to 4. Then click Next.
Click Next to skip Add notifications and Add tags sections.
In Review page, click Create Auto Scaling group.
Let’s check out EC2 instances and label their names as manager and worker nodes.
Each instance is assigned to each different availability zones, which is well suited to the purpose of using Docker Swarm to achieve high availability and Fault tolerance of deployed services.
Let’s connect to each node by SSH and establish manager and worker nodes in Docker Swarm.
Place a check mark on manger node and click Connect to connect to the node by SSH.
Place the private key pem file to a file pathway where you execute ssh command in your local Linux server.
Then change the file attribute as read-only to the server owner and execute the following SSH command as suggested by AWS SSH client tab below:
ssh -i "dockerswarm.pem" root@54.89.126.175
Interestingly, the user name for login was root user rather than ec2-user. Regardless, the SSH connection as root user was not granted.
Logging as ec2-user was granted below.
Let’s connect to 3 worker nodes as well through SSH connection. The following is a screenshot showing all 4 nodes’ SSH connection below:
In manager node, execute docker swarm init command. Then the node become a manager node in the Swarm.
The following command (See below) is executed in other remaining 3 nodes and they became worker nodes in the Swarm.
· Upper left: worker node 1
· Upper right: worker node 2
· Lower left: worker node 3
· Lower right: manager node
Check each public IP addresses below:
Let’s check whether Docker Swarm mode was made correctly.
In manager node, let’s execute ‘docker service ls’ command. Since no prior established service was available, nothing showed up. Next, check whether all the 4 nodes are detected by the manager node.
‘docker node ls’ command showed 4 established nodes. Manager status indicates Leader. This means that it is the leader of the manager node. This is correct as it is the only manager node in this Swarm.
Unfortunately, Docker Swarm mode did not function properly when executing docker service create. It caused the error message below:
Error response from daemon:
pull access denied for service, repository does not exist or
may require 'docker login': denied: requested access to the resource is denied
I executed ‘docker login’ command in the manager node, but it did not resolve docker service command issues.
It is possible that Docker daemon does not function properly for unknown reasons. I decided to start over and prepare all 4 nodes as demonstrated up to this point.
First of all, leave Docker Swarm by rendering “docker swarm leave — force” command for all 4 nodes.
Next, eliminate any recurrence of the above potential issue with the following steps:
· Log into docker hub again inall 4 nodes to ensure the update of credential file stored on the each node (docker login)
· Then establish a new Swarm with a manager node and 3 worker nodes.
Find the list of all nodes from manager node below. Everything looks fine.
I inspect the status of the manager node with the following command.
docker node inspect - pretty self # - pretty: human-readable output; self - manager node itself.
Manager Status clearly shows the current node can reach out to Raft database and it is the leader of the manager node group. Everything is correct!
Next, I choose a worker node with node ID indicating a series of string value, “squfn4d8okxfo1knj5uyx5af6”, and execute the following command:
docker node inspect - pretty squfn4d8okxfo1knj5uyx5af6
The worker node is available and ready shown below. The two other worker nodes are also fine upon testing.
Thus, the current new Swarm with 4 cluster looks fine.
Now is the time for rendering swarm service requested by our client!
Let’s create 3 services: Redis database, HTTP apache webserver and postgres database. Using -d option allows us not to stand by until the service creation is complete in command line prompt mode.
Upon executing the following 3 docker service create commands, they generate Docker service IDs string value shown below.
The Docker service IDs are also found in docker service ls command.
Both redis-db and web-server services are running shown as Replicas 1/1 and they look good. But postgres-db service shows replicas 0/1, meaning that no service is available. Per the status of the tasks in the postgres-db service, the current status indicates “assigned”.
Let’s look further on the status of the docker service object. The “Init” key value indicates false, meaning that the postgres-db service object was not initialized.
To look further the information on Postgres Image, I referred to the official Docker Hub image (See below).
I realized that I used a wrong environment variable name as “POSTGRES_PASSWORD_FILE” when setting up this database. The correct one should be “POSTGRES_PASSWORD”.
This means that I need to remove the wrong environment variable and add a correct one.
Let’s look into the grammer of docker service update. The following is a screenshot listing useful options.below.
We render the following command to correct the postres-db service:
docker service update --env-rm POSTGRES_PASSWORD_FILE \
--env-add POSTGRES_PASSWORD=passwd \
--publish-add 5432:5432 \
--replicas 1 postgres-db
The attempt to update the database seems to be successful.
Then I check again whether the postgres-db service was running. Replicas now indicate 1/1 service was running.
Next, I update redis database service (redis-db) with 4 replicas and publish port number 6379. First, I want to test whether the Docker daemon can recognize that the same port number has been requested again.
docker service update - publish-add 5432:5432 - replicas 4 redis-db
Indeed, the docker daemon shows an error message that the port is already in use by postgre-db service as an ingress port. Therefore, the Docker daemon should do its own job correctly!
As soon as I render the following right command requesting to publish port number 6379 with 4 replicas, the total of 4 tasks are assigned for redis-db service successfully (See the screenshot above).
Let’s check the status of 2 updated services so far. Both postgres-db and redis-db services are up and running with their respective replicas number and open port.
Next, let’s update “web-server” service with the following command:
docker service update - publish-add 80:80 - replicas 10 web-server
As soon as the command is rendered, 10 tasks are prepared. This is the screenshot in the middle of the processs. Some tasks are already running while others are just starting the tasks (See below).
In a few seconds, all 10 tasks are under the running status below.
Lastly, let’s check whether each service functions correctly.
Let’s access to HTML content of the web-server service.
In worker2 node, check container and find containers for PostgreSQL service. The name starts with “postgres-”.
Then execute bash shell to get into the container. Postgres program existed in the container!
Next, let’s get into the container for Redis database through the Manager node.
You can internally connect to services by accessing to their containers only if they are present in your current node. But, you cannot access to PostgreSQL service if you are in any nodes other than worker2 node.
So far, we have tested the functionality of their services by accessing to their containers present in 4 nodes.
Let’s connect from our local computer to the PostgreSQL database in Docker Swarm mode. We should access to the database through any of the 4 public IP addresses in this Docker Swarm.
We have found numeric version 14 of the program and installed it. Then we successfully connected to the PostgreSQL database. Use the passcode of the environment variable, “POSTGRES_PASSWORD” that we set up for Docker service create or update command (See below).
PostgreSQL database in Swarm was accessible in any of 4 nodes below:
Next, let’s connect to Redis database from my local Linux server!
Install the application on my personal computer and check the program functionality.
Let’s connect to a node.
The access to Redis database was also successful from all the 4 nodes remotely. Great!
I also observed that each IP address of 4 nodes pointed to the same web page content, indicating that the apache webserver service was available throughout all 4 IP addresses. This confirmed that all 4 nodes talked to each other (See below).
Next, check whether all 3 service containers spread over all 4 nodes since Docker Swarm is aimed at contributing to high availability and fault tolerance of services for clients.
All 15 containers consisting of any of 3 services are well distributed over to the entire 4 nodes (1 manager and 3 workers) shown above.
Additional tasks
I packed up my computer in the sleep mode and went back home so all the previous SSH connections were cut off. However, I did not terminate AWS Auto Scaling group for the 4 EC2 instances so that their public IP addresses still remained the same.
I reconnected to all the 4 instances by SSH and found that all 4 nodes (EC2 instances) still participated in the old Swarm. As soon as all the 4 nodes left the old Swarm, their containers were all gone; the storage for containers were considered ephemeral unless they were mounted to a local host file system. This was a great revisit of the past project 11 (week 16) using bind mount Docker container to a designated file and directory of the local host Linux system.
Let’s execute the following 3 commands to establish 3 services below:
docker service create -d --name redis-db --replicas 4 -p 6379:6379 redis:latest
docker service create --name web-server --replicas 10 -p 80:80 httpd:alpine3.18
docker service create --name postgres-db --replicas 1 -p 5432:5432 -e POSTGRES_PASSWORD=passwd postgres
Look at Worker3 node (Public IP address: 3.91.39.156).
Then look into a list of containers inside the Worker3 node.
Then let’s further investigate on a ccontainer ID (a530a39ff4cc) by executing the following command. Please note that the container name consists of the following format: <task_name>.< task_ID>
docker container inspect redis-db.2.l92529lz1iwe7o9wdoxbkw85e
Under the “HostConfig” key section, “Binds” key value is null. This means that Binds mount is not set up and the content of this container is not stored in a physical local host file system. As a result, the content of the container will be deleted when all 4 nodes leaves Docker Swarm.
Let’s look further down to the “Labels” key section listing the information on node, service and task.
Interestingly, the above information can be also retrieved using docker service commands in the manager node by searching in a step-wise manner for the following 4 components:
(1) Node ID: let’s look at the list of all the 4 nodes
(2) Service ID and name: After creating 3 services, look at their details.
(3) Task ID and name: Let’s look at all the available tasks under the redis-db service.
Although the worker node is not allowed to use docker service command, it can still retrieve actively running containers using docker container command below:
Please note that each container name consists of its originating task name and ID in the following format: <task_ID>.<task_name>
Everything looks great, but there is one problem: I could not ping one another among all the 4 nodes. After extensive search for couple of hours, I found that the security group for 4 nodes (EC2 instances) should be updated to accept echo request.
Place a mark on one of the 4 nodes and click the security group link in blue highlight below.
Next, click Edit inbound rules.
Click Add rule and customize the rule with the following:
· Type: Custom ICMP — IPv4
· Protocol: Echo Request
· Source: Anywhere — IPv4
Then click Save rules.
Review the Inbound rules after editing below: Echo Request should be included in Custom ICMP — IPv4 below.
For Access control group in VPC, all traffic were allowed by default settings so that there was no need to make changes.
Now, ping commands were successfully executed among the 4 nodes.
(1) From Manager node: ping worker1, worker2 and worker3 nodes in turn.
(2) From worker1 node: ping manager, worker2 and worker3 nodes in turn.
(3) From worker2 node: ping manager, worker1 and worker3 nodes in turn.
(4) From worker3 node: ping manager, worker1 and worker2 nodes in turn.
To wrap up, leave Docker Swarm, log out from Docker Hub (docker logout). Then terminating Auto Scaling group used for 4 EC2 instances (nodes) will clean up everything. Thank you for walking through all the detailed steps. Hope you find this demonstration is helpful. Feel free to contact me if you have any question by email or LinkedIn site shown on the right side of my blog.
I will see you in my next article!
Summary
Docker Swarm focuses on creating the cluster rather than individual containers. Once a manager node create a service, it also assigns number of tasks for the service and each task is scheduled to create a container.
Epilogue
Note-taking during the project helped me find out any error shown below:
· “docker create service” is wrong. “docker service create” is correct.
· differences between “docker service create” and “docker service update”
· docker service ls : get a list of services.
· docker service ps <service-name> : get a list of tasks under the service.
· docker container logs command to help find issues per each container.
· Task is an instance of a service.
· Container or object is an instance of Docker image.