Automating & scaling application infrastructure with Docker Swarm and AWS

Aalok Trivedi
10 min readMar 9, 2023

Intro

In my last article, we discussed how quick and efficient it is to create flexible application environments with Docker Compose. But what if we wanted to scale our containerized application architecture, add high availability, and provide fault tolerance without the headache and bloat of management overhead? In comes Docker Swarm!

Swarm is one of the leading tools in cluster orchestration & management and makes scaling applications and infrastructure a breeze. By taking advantage of decentralized roles and built-in load balancing, Swarm is able to leverage high availability and provide ultra-fast deployment with minimal overhead.

Too good to be true? Well, let's build something and find out!

What we’re building

  1. A Docker Swarm environment with one (1) manager node and three (3) worker nodes.
  2. A Docker Stack that will deploy a 3-tier application architecture with three (3) Docker services (Web/Apache, Node.js, and Postgres).

Prerequisites

  1. An AWS account with IAM user access.
  2. A Docker Hub account.
  3. Familiarity with basic Docker concepts, Docker Compose, and CLI commands.
  4. Familiarity with Amazon EC2 instances and the AWS Management Console.
  5. Familiarity with Linux file systems and commands
  6. Access to a command line tool.
  7. An IDE, such as VS Code.

Key concepts

Node
A node is a physical or virtual machine that hosts our application and can take on different roles, such as a manager or worker. A manager interprets the services and distributes tasks to all the worker nodes. A worker runs the tasks, given by a manager node.

Task
A task is simply a Docker container and the set of instructions or commands to be run on that container.

Let’s get started!

Step 1: Groundwork

Before we dive into the nitty gritty of Docker Swarm, we need to lay some runway and set up servers/machines that will host our Swarm nodes. We’ll be using Amazon EC2 instances, but we can also use virtual and local machines as well.

What we need:

  1. One (1) EC2 instance with a node role of manager.
  2. Three (3) EC2 instances with a node role of worker.

Security groups

First, let’s work a little backward and create our security groups for both the manager and worker nodes.

To ensure all the nodes can communicate with each other, Docker Swarm needs access to specific ports:

For more details, visit: https://www.bretfisher.com/docker-swarm-firewall-ports/

So, let’s create two security groups: One for the manager (swarm_app_mgr_sg), and one for the workers (swarm_app_wkr_sg).

Just create the groups first, and then go back to add the inbound rules for both. We need the source for each rule set to the mgr/wkr security groups, and AWS won’t let you add the source to an sg after-the-fact.

Manager security group inbound rules:

  • SSH | TCP | Port: 22 | Source: My IP
  • HTTP | TCP | Port: 80 | Source: IPv4 anywhere
  • HTTPS | TCP | Port: 443 | Source: IPv4 anywhere
  • Custom TCP | TCP | Port: 8080 | Source: IPv4 anywhere
  • Custom TCP | TCP | Port: 2377 | Source: swarm_app_wkr_sg
  • Custom TCP | TCP | Port: 7946 | Source: swarm_app_wkr_sg
  • Custom UDP | UDP | Port: 7946 | Source: swarm_app_wkr_sg
  • Custom UDP | UDP | Port: 4789 | Source: swarm_app_wkr_sg

Worker security group inbound rules:

  • SSH | TCP | Port: 22 | Source: My IP
  • HTTP | TCP | Port: 80 | Source: IPv4 anywhere
  • HTTPS | TCP | Port: 443 | Source: IPv4 anywhere
  • Custom TCP | TCP | Port: 7946 | Source: swarm_app_mgr_sg
  • Custom UDP | UDP | Port: 7946 | Source: swarm_app_mgr_sg
  • Custom UDP | UDP | Port: 4789 | Source: swarm_app_mgr_sg

Create the manager node server

If Docker Swarm is the orchestra, the manager nodes are the conductors and administrators. It’s the manager’s job to control/distribute services and tasks to the worker nodes. A manager can also add/remove and promote/demote nodes based on need. Think of it as the administrative control center of your application environment.

It’s generally good practice to have multiple managers for high availability, but for now, one will do.

To save ourselves some repetition, we’ll create a new EC2 Launch Template (swarm_node_template) to easily apply the same settings to each server/node we need to spin up.

Launch template settings:

  • AMI: Amazon Linux 2.
  • Type: t2.micro (Free tier, but feel free to choose whatever fits your application requirements).
  • Key pair: Choose/create a key pair.
  • Network: We can set these when we launch our instances.
  • Advanced Details User data: This is where we need to add a bash script to make sure Docker and Docker Compose are installed, enabled, and started when the instances are launched. Copy and paste this script into the User details section.
sudo yum update -y
sudo yum install -y docker

#install docker compose
sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

sudo systemctl enable docker
sudo systemctl start docker

Done! Let’s launch an instance based on this template and name it swarm_app_mgr. Remember to attach the swarm_app_mgr_sg security group! We can use the default VPC and subnet.

Create the worker node servers

We can use the same launch template for the worker machines to launch three instances with the swarm_app_wkr_sg security group and name them swarm_app_wkr1, swarm_app_wkr2, and swarm_app_wkr3.

Perfect! We’re ready to Swarm!

Step 2: Create the Swarm

It’s time to create our Swarm environment and assign each EC2 instance its node role.

Assign the manager

First, let’s SSH into the instance we designated as our manager node and make sure Docker and Docker Compose have been successfully installed by running docker --version and docker-compose --version.

Note: Log in with your Docker Hub username and password sudo docker login

To create a new swarm and assign this server as the manager, run this command:

sudo docker swarm init

We now have a manager node and directions to create new manager/worker nodes!

Notice that the IP address is the instance’s private IP address. This is important because a manager node needs a static IP for the worker nodes to connect to. Copy that first command.

In this case, Swarm automatically detected the right IP, but sometimes it might direct you to ‘advertise’ an IP. If that happens you'll need to run the init command again with an additional --advertise-addr <YOUR_IP_ADDRESS> option.

Assign the workers

With the command copied, let’s open three more terminal windows and SSH into each instance we designated as the worker nodes. Paste the command we were given from the manager. We should get confirmation of each node’s addition to the swarm.

sudo docker swarm join --token <TOKEN> <MANAGER_IP_ADDRESS>

In the Manager terminal, we can run sudo docker node ls to see all of our nodes and their statuses. (the * indicates the node we’re currently on).

If we try this same command in a Worker we’ll get an error message:
Error response from daemon: This node is not a swarm manager. Worker nodes can’t be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager.

This is by design. All of our admin and orchestration should be done only from the Manager nodes.

Optional bonus: I decided to change the hostnames for each node to make it easier to recognize what-is-what. It’s not necessary, but here's how to do it, if you’re interested.

We won’t really need the other worker terminals anymore, so we can close or minimize them.

And, with that, our swarm nodes are ready for action!

Step 3: Create the Swarm stack

Our base environment is set, but we must now define and deploy the application architecture. Our application will be set up in three tiers:

  1. Frontend/web tier: powered by Apache web servers
    (Replicas: 10).
  2. Backend/Application tier: powered by Node.js
    (Replicas: 4).
  3. Backend/Database tier: powered by Postgres
    (Replicas: 1).

Note: We won’t have any actual application source code to deploy, but the process would be the same.

The beauty of Swarm is that it’s a declarative process, meaning all we have to do is define our resources and the end result. Swarm takes care of the rest and figures out what it needs to do to get to that result.

We define all the nodes, services, images, volumes, and deployment options; and Swarm takes care of load balancing and distributing the services/containers amongst the available nodes.

If a task (a service instance/container) fails, Swarm will automatically shut it down and create a new one to replace it to fulfill the total number of replicas we declared.

If a node fails or becomes unavailable, Swarm will redistribute the required services and containers amongst the available nodes. This is what makes Swarm so highly available and fault-tolerant!

Docker Compose

Of course, we can individually create services and tasks using Docker CLI commands, but dear god, that sounds tedious and time-consuming. If you’re familiar with Docker Compose, we can create a Swarm Stack to orchestrate our entire architecture with one file and one command!

Our stack will be built from a docker-compose.yml file. Everything is the same as Docker Compose, such as defining services, ports, volumes, and networks; however, a stack additionally allows us to define deployment options. We can dictate how many replicas each service needs, placement constraints, restart policies, rollback policies, and much more. For a complete list of options, visit the official documentation.

Enough preamble. Let’s build!

While still SSH-ed in your manager node, create a new docker-compose.yml file or clone a Github repo (this is what I will be doing), with the compose file in it.

As a reminder, this is the basic structure of the compose file. We have four services* and two bridge networks for the frontend and backend.

version: "3.8"

services:
web:

node:

db:

visualizer:

networks:
frontend:
driver: bridge
backend:
driver: bridge

volumes:

*: I’ve included a fourth service called visualizer, which is a really great dev tool visualizes the landscape and health of services across all nodes. You’ll see!

Since we’re only focused on the architecture, and not the actual application code, I’ve left my Compose file fairly simple. Feel free to add to your file and make it as robust as you want for your application needs.

For deploy, replicas is the only required option. We’ve also declared placement constraints on each service so they only get distributed to worker nodes. If we didn’t have this declaration, tasks would also be distributed to the manager node.

Our Compose file is ready!

Step 4: Deploy the Stack

Ready to see how easy it is to deploy a Swarm Stack?

In the manager node shell, we can deploy by using the docker stack delpoy command with a -c to indicate a Compose file. Then add your compose file and the name of your stack (I named it swarm_app).

sudo docker stack deploy -c docker-compose.yml stack_name
Networks and services are being created!

And just like that our application is deployed!… well, for development, at least. Cool, right?!

For an overview of services in our stack, we can run:

sudo docker stack services stack_name
Woo! all of our replicas are running!

For a more detailed view of all of our tasks/containers, we can run:

sudo docker stack ps stack_name

In the highlighted section, we can see Swarm at work, live! It looks like some of the node.js service tasks shut down (probably because of the ‘sleep’ command), but were immediately replaced by a running task!

If we go to any of our swarm node’s public IP addresses, we should see our Apache website live.

Visualizer

Now for the really cool part! Remember that surprise fourth visualizer service we deployed on our manager node and set to port 8080? Let’s go to it! manager_node_public_IP:8080.

Beautiful! I come from a UX/UI background, so this dashboard is super helpful in understanding what’s happening. The visualizer gives us a live overview of our containers spread throughout the worker nodes.

If we want, we can simulate a node failure. In the AWS Management Console, stop one of the worker node servers and head back to the visualizer.

Because one of the nodes became unavailable, Swarm redistributed the tasks to the other available nodes while still maintaining the same number of replicas for each service!

Congrats!

We’ve successfully orchestrated and deployed a highly available, fault-tolerant web application architecture with Docker Swarm!

Thank you

Thank you for following me on my cloud engineering journey. I hope this article was helpful and informative. Please give me a like & follow as I continue my journey, and I will share more articles like this!

--

--