Creating Highly Available Nodes on ICON — Stage 2: HA Cluster with Nginx and P-Rep Node

2infiniti
2infiniti
Oct 27, 2019 · 7 min read

This guide is based on Ubuntu 18.04 on AWS

In the last tutorial: Creating Highly Available Nodes on ICON — Stage 1: Active/Passive Failover with Pacemaker and Corosync, we covered how to create an active/passive failover configuration using AWS Elastic IP. We utilized corosync and pacemaker for resource management and network pulse checks, and in case of node failure, a backup server will automatically take over node operations. In this tutorial we will extend on the previous one and create a highly available cluster using the same technology stack, this time with two docker containers in each host, nginx and P-Rep node. The main usage of nginx is to create a throttle in order to rate limit requests, and to filter requests based on whitelisted IPs, this creates a layer of protection in front of our real P-Rep node. For more information on nginx usage under ICON network, please consult this document: How to use nginx to prevent DDoS attacks

Preparation

AWS Instance

Create two instances on AWS under two availability zones, instance 1 (az-a) on us-east-2a and instance 2 (az-b) on us-east-2b. Follow this previous tutorial on node installation using docker: ICON Node Installation — Docker + AWS. Ignore docker-compose.yml as we’ll be creating a new one in this tutorial using newer P-Rep images, in addition an nginx docker image. This is the architecture we’re aiming to create:

Cluster under two availability zones
Each host will run two docker containers, with nginx in front of p-rep node

Let’s rename the hostnames for better legibility
(both servers)

change hostname
$ sudo hostnamectl set-hostname az-a
# similarly for az-b, then reboot
$ systemctl reboot

edit hosts file
(both servers)

/etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.10.168 az-a-hb
172.31.16.135 az-b-hb

ping the other server see if host is resolving
(both servers)

$ ping az-b-hb64 bytes from az-b-hb (172.31.16.135): icmp_seq=1 ttl=64 time=0.305 ms
64 bytes from az-b-hb (172.31.16.135): icmp_seq=2 ttl=64 time=0.286 ms
64 bytes from az-b-hb (172.31.16.135): icmp_seq=3 ttl=64 time=0.276 ms

Docker Compose

(both servers)

In this tutorial we’re adding in a new docker service, namely nginx, to handle traffic in front of our P-Rep node. Nginx will be able to rate limit the requests received in order to prevent DDoS attacks and we’ll also leverage nginx to receive requests from a whitelisted IP list(we’ll follow up with an automated script to update this list). The way to connect nginx to our prep node is via NODE_CONTAINER_NAME: "prep-node". For a full list of nginx environment variables please consult this documentation: https://github.com/JINWOO-J/nginx_docker

Spin up the containers per usual
(both servers)

$ sudo docker-compose up -d

next check on the status (it’ll take a while for p-rep node to sync all the blocks, then it’ll turn to healthy status)

$ sudo docker ps -a

You should have two containers, prep-node:1910211829xc2286d and nginx:1.17.1.

We’ll also need to create another docker-compose file, call it docker-compose.backup.yml. This is essentially the same as our docker-compose earlier but for our backup node.

Pacemaker and Corosync

You can follow this tutorial for cluster setup using crm, in the last tutorial we configured a single EIP that acts as a floating IP, which points to the active server. In this tutorial we’ll also configure an active EIP that acts as a floating IP, in addition a backup EIP in case when our active EIP stops functioning. We’ll also configure using pcs this time as it seems to be the choice by the foundation, so instructions will become easier to follow.

(both servers)

# Update and install 
$ sudo apt-get -y update
$ sudo apt-get install pacemaker
$ sudo apt install pcs
# Verify
$ pacemakerd --version
Pacemaker 1.1.18
Written by Andrew Beekhof
$ corosync -v
Corosync Cluster Engine, version '2.4.3'
Copyright (c) 2006-2009 Red Hat, Inc.

then configure the corosync.conffile

/etc/corosync/corosync.conftotem {
version: 2
cluster_name: peer_cluster
secauth: off
transport: udpu
}
nodelist {
node {
ring0_addr: az-a-hb
nodeid: 1
}
node {
ring0_addr: az-b-hb
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
}

enable the services on system boot

$ systemctl enable corosync
$ systemctl enable pacemaker

Create Peer and Backup Peer Services

(both servers)

Create peer.service under /lib/systemd/system/

[Unit]
Description=Loopchain Peer
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
StandardError=null
StandardOutput=null
WorkingDirectory=/home/ubuntu
ExecStartPre=/home/ubuntu/cluster.sh
ExecStart=/usr/local/bin/docker-compose -f /home/ubuntu/docker-compose.yml up -d
ExecStop=/usr/sbin/pcs resource disable Backup
ExecStop=/usr/local/bin/docker-compose -f /home/ubuntu/docker-compose.yml down
[Install]
WantedBy=multi-user.target

and backup_peer.service also under /lib/systemd/system/

[Unit]
Description=Loopchain Backup_peer
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
StandardError=null
StandardOutput=null
WorkingDirectory=/home/ubuntu
ExecStartPre=/home/ubuntu/cluster.sh
ExecStart=/usr/local/bin/docker-compose -f /home/ubuntu/docker-compose.yml up -d
ExecStop=/usr/local/bin/docker-compose -f /home/ubuntu/docker-compose.yml down
[Install]
WantedBy=multi-user.target

enable the services on system boot

$ sudo systemctl enable peer.service
$ sudo systemctl enable backup_peer.service

Change DB Directory Name

(both servers)

The services we just created, besides the basic docker up/down’s, we also inserted a bash script cluster.sh, this is to change the DB directory name based on the node IP. The reason for this is because the IP changes when the backup node becomes master. If the IP changes, the directory structure of LevelDB will also change. When the backup node is changed to the master node, the IPaddr also changes. Loopchain also changes the structure of LevelDB when IPaddr is changed. We’ll overcome this by changing the directory name based on IP, create the files under the same root as docker-compose

#!/bin/bashDBDIR="/home/ubuntu/data/loopchain"
MYIP=`curl http://169.254.169.254/latest/meta-data/public-ipv4`
ASISNAME=`ls -t ${DBDIR}/.storage| head -1`
TOBENAME="db_${MYIP}:7100_icon_dex"
if [ "$ASISNAME" == "$TOBENAME" ];then
echo "Match"
else
if [ ! -d "$DBDIR/.storage" ];then
mkdir -p $DBDIR/.storage/$TOBENAME
else
mv ${DBDIR}/.storage/${ASISNAME} ${DBDIR}/.storage/${TOBENAME}
fi
fi

Add executable permission to the file

$ sudo chmod +x cluster.sh

AWS CLI Configuration

  1. Log in to your AWS Management Console.
  2. Click on your user name at the top right of the page.
  3. Click on the Security Credentials link from the drop-down menu.
  4. Find the Access Credentials section, and copy the latest Access Key ID.
  5. Click on the Show link in the same row, and copy the Secret Access Key.
$ sudo apt update
$ sudo apt install aws-cli
# change to root, this is necessary
$ sudo su
$ aws configure

For the region name, example is us-east-1

Start Cluster

Now we’re ready to start the cluster, setup a password first for our user hacluster
(both servers)

$ passwd hascluster

now run PaceMaker GUI daemon

$ systemctl start pcsd
$ systemctl enable pcsd

then configure the cluster

# authenticate the user first
$ pcs cluster auth az-a-hb az-b-hb
$ pcs cluster setup --name peer_cluster az-a-hb az-b-hb transport udpu

start the cluster

$ pcs cluster start --all --wait=60

Cluster Resources

Create PCS config
(one server)

$ sudo pcs cluster cib tmp-cib.xml
$ sudo cp tmp-cib.xml tmp-cib.xml.deltasrc
$ sudo pcs -f tmp-cib.xml property set stonith-enabled=false

Create EIP agent
(one server)

$ sudo pcs -f tmp-cib.xml resource create awseip-peer ocf:heartbeat:awseip allocation_id={your EIP allocation ID} elastic_ip={your elastic IP} op migrate_from interval=0s timeout=30s migrate_to interval=0s timeout=30s monitor interval=20s timeout=30s start interval=0s timeout=30s stop interval=0s timeout=30s validate interval=0s timeout=10s

Enable Peer Service Resource

$ sudo pcs -f tmp-cib.xml resource create peerservice systemd:peer op monitor interval=60 timeout=100 start interval=0s timeout=100 stop interval=0s timeout=100

Backup Elastic IP

$ sudo pcs -f tmp-cib.xml resource create awseip-backup ocf:heartbeat:awseip allocation_id={your EIP allocation ID} elastic_ip={your backup elastic IP} migrate_from interval=0s timeout=30s migrate_to interval=0s timeout=30s monitor interval=20s timeout=30s start interval=0s timeout=30s stop interval=0s timeout=30s validate interval=0s timeout=10s

Backup Peer Resource

$ sudo pcs -f tmp-cib.xml resource create backupservice systemd:backup_peer op monitor interval=60 timeout=100 start interval=0s timeout=100 stop interval=0s timeout=100

Peer Group

$ sudo pcs -f tmp-cib.xml resource group add Peer awseip-peer peerservice

Backup Group

$ sudo pcs -f tmp-cib.xml resource group add Backup awseip-backup backupservice

Add Colocation

$ sudo pcs -f tmp-cib.xml constraint colocation add Backup with Peer -INFINITY id=colocation-Backup-Peer--INFINITY

Apply Configuration

$ sudo pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc

Now check the status of our cluster, if all is well, you should have something similar to this:

Which reflects exactly what we displayed in the first architecture diagram.

Testing Failover State

Right now the active node is az-a-hb, let’s take the node down and see what happens.

$ sudo pcs cluster standby az-a-hb
$ sudo pcs status

After some time (about a minute), check AWS, you’ll notice that our active EIP is reallocated to az-b-hb, effectively failing over to the backup node. To restore the node, run

$ sudo pcs cluster unstandby az-a-hb

The active node is still az-b-hb, reboot it see what happens

$ sudo systemctl reboot

Check back to AWS again, as expected, the EIP points back to az-a-hb. We now have an HA cluster with nginx in front of our p-rep node, next we will apply IP whitelist renewal described in this documentation: How to use nginx to prevent DDoS attacks, when the automation script becomes available. We will be writing another tutorial in part 3 to conclude our node architecture setup.

Common Cluster Commands

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade