Orchestrating a Swarm: Setting up and getting started with Docker Swarm: Part I
Scenario
You work for a company that is new to using containers, and has been having issues with containerized workloads failing on their global shipping application. Worried about the potential for additional data loss, they employ you with the task of updating the infrastructure, leveraging Docker Swarm to restore data for the containers that are no longer in a healthy state. Due to their unfamiliarity with Docker Swarm, they need a step-by-step guide on setting up a Swarm for their containerized workloads.
Docker Swarm is what as known as a container orchestration tool. It allows you to coordinate many Docker containers and services from a central Docker manager node. Swarm mode provides load balancing, cluster management, rolling updates, desired state reconciliation, and many other features. You declare the desired state of the service and Docker Swarm handles the rest. Since its initial release in 2014, Docker Swarm has continued to be one of the most popular container orchestration tools for production environments.
Glossary
- Node: In Docker, a node is a single instance of the Docker engine within the swarm. A single server can contain multiple nodes, but in typical production environments nodes are distributed across multiple physical servers.
- Manager Node: The manager nodes dispatch tasks to worker nodes, and perform orchestration, as well as cluster management to maintain the desired state of the swarm. They can also act as worker nodes in that they can execute tasks.
- Worker Node: Worker nodes receive and execute tasks dispatched by manager nodes. They cannot dispatch the tasks or perform cluster management or orchestration.
- Task: A task carries the commands and individual containers to run on each node. The number of replicas configured for a given service will determine the number of tasks.
- Service: A service is used to deploy application images. Often, a service is the image of a microservice within a larger application context. For example, a service might be a webserver or a database. A service can contain any number of tasks, all of which perform the same action on different nodes.
- Stack: A collection of services all performing different duties but acting as a single application is called a stack. A swarm stack is an example of a microservice architecture
- In-memory: In-memory storage is short term, ephemeral storage. It is used for applications that use the memory immediately and have no need to store data long term.
Prerequisites
- An AWS account and a working knowledge of creating services in AWS
- Knowledge of basic Linux commands
- Visual Studio Code installed on your computer, as well as some basic working knowledge of the IDE.
Objectives
- Set up a Docker Swarm on a fleet of 3 EC2 instances with one manager node and two worker nodes.
- Launch a swarm service and then scale up to 3 replicas
- Deploy a stack to the newly created Swarm
For this tutorial I will be using the terminal in Visual Studio Code. My previous article covers setting up VS Code in more detail here. If you need help setting up VS Code please consult that article. Stop where it discusses installing Docker, as we will be doing that here as well.
We have a lot to cover, so let’s get started.
Step 1.)
First we’re going to launch our EC2 instances on AWS:
- Name it Docker for now
- Choose an Ubuntu 22.04 t2.micro instance
- Choose the a key pair or create a new one
- Place the instances in any region of your default VPC
- Choose a security group that allows SSH from your IP, inbound HTTP traffic from the internet (0.0.0.0/0), and all traffic from your VPC CIDR. The ingress rule allowing HTTP traffic from the internet will open up port 80, which we will need for part of this demonstration.
- Create 3 instances
Once the instances have been created rename them something like this: Docker_manager, Docker_worker1, Docker_worker2. Or Docker_node1, Docker_node2, Docker_node3; Just differentiate them so you know which instance will be your manager node and which will be your worker nodes
Step 2.)
Now you need to get the public IP address of your manager node and copy it. We’re going to be pasting it in an ssh config file. In my previous Docker article I detailed this a little more thoroughly, but basically you go into your directory .ssh, which should be located in your home directory, and create a file called config. No extension, just config. You will edit the file to look like this:
Host is the name you want to call your manager node. Next to HostName(make sure the capitalization is exactly the way it is above) you paste in that public IP that you just copied.
*** Note that each time you stop your EC2 instances and re-start them, a new public IP will be generated, and you’ll have to update this file. At the end of this article I have a special bonus for you; a Python boto3 script that automates this process.
For User it will be ubuntu, unless you are on a different image; whatever your user name is for your instance. Next to IdentityFile you will place the path that your key pair is stored in on your computer.
A short tangent about SSH agent forwarding
The next line in the config file requires a little bit of explanation. So, what I’m going to do is to first give a little background and explain what SSH agent forwarding is, and why you would want to use it.
We are going to ssh into our host server, and then ssh from there into another server. When we ssh into the first server we have our key on our computer, so we reference the file where the key is and voila! However, once we are in the remote server, the key is not present, so if you simply try to ssh into the next server you won’t be able to.
There are generally two ways of solving this. The first way is easy, but less secure. You simply place a copy of the key into a file with the same name that’s on your personal computer onto the remoter server. This is the quick and dirty way, but like I said, not as secure, for obvious reasons. Now you have a copy of your key sitting on a remote server. I read an article that compared it to leaving your keys in the door to your office, or your home. I honestly think that analogy is a little inflated, but you get the point nonetheless.
The second way to solve this is through something called ssh agent forwarding. The best analogy I’ve heard regarding agent forwarding is it’s kind of like entering your password (out of sight) onto your computer in order for your friend to use your computer. Essentially what happens is that a request is sent all the way back to the server where the key is stored. So there’s never a point when a key is left on a remote server.
Fortunately for me, I’m on a mac, which makes this process extremely painless. If you’re a Windows user, you’re not so lucky. My condolences. I believe it works the same on Linux as it does on Mac, but since most of you probably aren’t using Linux on your personal machine I won’t go into that.
For these next several steps that involve ssh keys I’ll explain how I’m doing it (Mac), how to do it the quick, dirty and less secure way, and then point you to some references if you are a Windows user; because honestly that’s a rabbit hole that I tried to go down, but it just seems like too much for me to read, especially when it’s not something I am needing to deal with(at least not at this time).
For now, if you’re on a mac, include the ForwardAgent yes part in the config file. If you’re on Windows, don’t.
*** By the way, for the sake of this project, copying the file into your remote server, aka the quick and dirty way, is totally fine. But my goal is to practice for production environments in which security would be a high priority.
Here are some links for further reading:
This article actually talks about how it’s not good practice to set up agent forwarding in the config file (It suggests using the -A option every time you ssh, so that agent forwarding is not enabled by default). For my case it makes sense to configure agent forwarding in the config file, since for this project we need it, and it’s the only time I will be using these servers.
The best Windows solution seems to be OpenSSH. Here is some documentation on that.
And an article detailing some features regarding agent forwarding on Mac
Ok, let’s get back to the task at hand.
Step 3.)
If you’re not using agent forwarding (or you’re not on a Mac) you can skip this step.
Add your key to be used with agent forwarding. First cd into the folder where your key is stored. Then run the command:
ssh-add <key_pair_name>
If you then perform ssh-add -l you will see that your key has been added to agent forwarding.
Step 4.)
Next we’ll connect to our manager node via SSH. We’ll do this through VS Code’s “Connect to host” feature. On the far bottom left you’ll see a blue icon. Click it. A menu should come up allowing you to “Connect to host”.
Then just select the name you gave to your remote host in the config file from step 2.
You should now be in your remote server.
Step 5.)
If you are using agent forwarding, then you can skip this next step. If you are having to copy your key to your server, here’s where we’ll be doing that.
First you need to switch to root user. Use the command sudo -s.
Then cd into /root/.ssh.
Next, create a file with the same exact name as your key pair. You will copy and paste your actual key from your computer into this file. Use the command nano <your_key_pair_name> to open up a file with nano.
Notice if you perform an ls you can see another file named “authorized_keys” in this same directory. This is where the authorized keys are referenced for the server (the key used to ssh into that server), and if agent forwarding is enabled, it will also be indicated in this file. It’s not the actual key though. It’s merely a pointer back to the key on my computer.
That pointer only exists if you have that ForwardAgent yes in your config file, or if you’ve manually enabled agent forwarding using the -A option in your ssh command. If you remove the agent forwarding in the config file and reconnect to your remote server, that pointer goes away, but the file “authorized_keys” will exist regardless.
After that, we’re going to create a config file in this same directory, just like we did on our computer. You’ll need the private IP addresses of your worker nodes to do so. Then, create and open the config file and paste in the following:
Host node2
HostName <private_IP_of_node2>
User ubuntu
IdentityFile /root/.ssh/<name_of_your_key>
Host node3
HostName <private_IP_of_node3>
User ubuntu
IdentityFile /root/.ssh/<name_of_your_key>
You can call the hosts whatever you want. I actually called mine “worker1” and “worker2”, but most people seem to prefer the “node1”, “node2”, “node3” naming convention. Once that information is filled out, save and close the file and then go back to your ubuntu user with ctrl + d.
Step 6.)
If you are not using agent forwarding, and just performed the tasks in the above step, you can skip this step.
If you are using agent forwarding we don’t need a config file, but I’m going to show you a step that will make your life a little bit easier.
Open up the file “.bashrc”(make sure you include the dot), in your home directory, with nano. In this file we are going to create persistent environment variables for the worker nodes, so that you don’t have to remember the private IP addresses of them every time you ssh into them. Scroll to the bottom of that file and type out the following:
node2='ubuntu@<private_IP_of_node_2>
node3='ubuntu@<private_IP_of_node_3>
Or instead of “node2” and “node3” you can call the variables “worker1” and “worker2”, it really doesn’t matter as long as it makes sense to you. You’ll need to grab the private IPs of the appropriate instances from the AWS console.
Step 7.)
Ok, this step is for both groups. Now we’ll actually SSH into our worker nodes.
First, in the VS Code terminal, click the little carat to split the terminal. Select “bash”.
Do this twice so you have three terminal screens.
In the center and far right screens we’ll SSH into the worker nodes. If you are using agent forwarding, all you need to do is type ssh + $<whatever you called your variable>, like so:
ssh $node2
If you copied and pasted your key onto the server, all you need to do is type ssh + <whatever you called your host in config>, like so:
ssh node2
Then do the same for node3 in the next terminal.
So now you should be connected to three different servers in three different terminal windows. Notice how all the IP addresses are different?
Step 8.)
Next we are going to install Docker on all three servers. Run the following commands in all three terminals.
First update packages:
sudo apt update
Then download the Docker installer:
curl -fsSL https://get.docker.com -o get-docker.sh
Then run the Docker install script:
sh get-docker.sh
You can then run the command sudo docker version to verify Docker has been installed.
Step 9.)
Next we are going to customize the host names on our servers. This one’s really easy. In each terminal type:
sudo hostnamectl set-hostname <Name of Host>
Then reboot to see the name take effect
sudo reboot
The name of the host can be anything you want. I would recommend something similar to what you called your EC2 instances. Either node1, node2, node3, or manager, worker1, worker2. As long as you can discern which is the manager node and which two are the worker nodes.
After you’ve rebooted you may need to re-connect to each of your instances again. When you are finished your three servers should each have unique host names that identify them.
So we have our three servers. We’ve named them, installed Docker onto them, and we’re able to SSH into the worker nodes from the manager node, but we haven’t actually done anything to make them a swarm yet. Let’s do that next.
Step 10.)
First we need the Private IP address of the manager node. Run ip addr on your manager node. A lot of information is returned, but you can see from the picture where to find the private IP. Don’t worry about the last forward slash and number. Copy that. We’ll need it for the next command.
And to initialize the Docker Swarm, again on your manager node, run:
sudo docker swarm init --advertise-addr <private_IP>
You can see that the command outputs a command with a join token. You simply need to paste this command into the two manager nodes, prefaced by sudo.
And do the same for the other worker node.
Now, if you run sudo docker node ls on the manager node, you can see all of the nodes of the swarm and their status. Notice how it also displays which node is “Leader”, our manager node.
Congratulations! You’ve just set up a Docker Swarm. Click here for Part II, where I will walk you through actually setting up some services, as well as stacks.
As a bonus for making it this far, here is that script that I talked about in the beginning of the article. This will simultaneously start all three of your EC2 instances, and update your “/.ssh/config” file with the public IP of your manager node. You simply need to put in your instance IDs in lines 46–48, the absolute path to your “config” file in line 49, and the line number in which your HostName + <IP_address> is in your file.
You’ll also need Python and boto3 installed on your computer for it to work.
And here’s a script to stop your instances. For this one you simply need to input your instance IDs.