HA Cluster with ElasticIP using Corosync and Pacemaker

Anand Thiyagarajan
6 min readOct 5, 2018

--

Lets build a HA Cluster setup of 3 nodes on AWS infrastructure using Corosync and Pacemaker, and have a floatingIP using ElasticIP in AWS with detailed steps.

Corosync is an open source program that provides cluster membership and messaging capabilities, often referred to as the messaging layer, to client servers.

Pacemaker is an open source cluster resource manager (CRM), a system that coordinates resources and services that are managed and made highly available by a cluster. In essence, Corosync enables servers to communicate as a cluster, while Pacemaker provides the ability to control how the cluster behaves.

Prerequisite:

  1. Three EC2 instances created on AWS. (As you know already, each instances has their own hostname, private-IP, and public-IP.
  2. Need an unassigned ElasticIP created in AWS.
  3. Need login as root user on all three instances. (Able to do su -)
  4. Configure AWS settings by running aws configure (as root user *very important*) on all instances.
  5. Better to have an open security group assigned to those instances, till we finish our setup, then have it closed later.

Installation:

On all 3 instances run the following commands:

Installation of both corosync and pacemaker can be achieved by a single command, as corosync is dependent on pacemaker:

$ sudo apt-get install pacemaker

We need crmsh or pcs for configuring various cluster functionalities, and resources. So verify their presence on your instances by checking their versions.

crmsh is a cluster management shell for the Pacemaker High Availability stack.(for more info: https://crmsh.github.io/).
pcs — pacemaker/corosync configuration system.

# pacemakerd --version
Pacemaker 1.1.14
Written by Andrew Beekhof
# corosync -v
Corosync Cluster Engine, version '2.3.5'
Copyright (c) 2006-2009 Red Hat, Inc.
# pcs --version
0.9.149
# crm --version
crm 2.2.0

Corosync Configuration:

Create the corosync configuration file and dump the following content on our three instances:

# cat /etc/corosync/corosync.conftotem {
version: 2
crypto_cipher: none
crypto_hash: none
cluster_name: HA # Any name for the cluster
rrp_mode: active
transport: udpu
token: 10000
}
quorum {
provider: corosync_votequorum
}
nodelist {
node {
ring0_addr: <Hostname of Node1>
name: master1
nodeid: 1
}
node {
ring0_addr: <Hostname of Node2>
name: master2
nodeid: 2
}
node {
ring0_addr: <Hostname of Node3>
name: master3
nodeid: 3
}
}
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}
logging {
fileline: off
to_logfile: yes
to_syslog: yes
debug: on
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys
{
subsys: AMF
debug: off
}
}

Note:

  1. <Hostname of Node> is the output of the command “uname -n” on corresponding hosts or the PrivateDNS name of the EC2 instance in AWS (not FQDN).
  2. We can also provide privateIP address of the instances instead of hostnames, it works but we can expect a warning while running “pcs status” saying: “WARNING: corosync and pacemaker node names do not match (IPs used in setup?)” which is ignorable.

(In order to understand the configuration, you can go through following references):

  1. https://github.com/corosync/corosync/blob/master/man/corosync.conf.5
  2. https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_enabling_pacemaker.html

Starting Corosync:

Start corosync on all instances , and check the ring status (as super user):

# service corosync start# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = <>
status = ring 0 active with no faults

Once corosync has been started successfully on all the instances, check the membership of the cluster as follows:

# corosync-cmapctl | grep member
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(<node1-privateip>)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(<node2-privateip>)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(<node3-privateip>)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined
# pcs status corosyncMembership information
----------------------
Nodeid Votes Name
2 1 <Hostname of Node2>
3 1 <Hostname of Node3>
1 1 <Hostname of Node1> (local)

Pacemaker Configuration:

Being a Cluster resource manager, Pacemaker needs the resources (here, ElasticIP) well defined to it.

For handling ElasticIP as a resource, we need to use ‘ocf:heartbeat:awseip’, so the resource handling protocol should be there in place:

If the file: /usr/lib/ocf/resource.d/heartbeat/awseip doesn’t exist in our instances, then grab it from: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/awseip

And make sure file has root ownership and appropriate access.

Now add, AWS_DEFAULT_REGION=<AWS-Default-Region> at the end of /etc/systemd/system/multi-user.target.wants/pacemaker.service

Thus, the completion of pacemaker configuration for ElasticIP resource can be verified by:

# pcs resource describe ocf:heartbeat:awseipocf:heartbeat:awseip - Amazon AWS Elastic IP Address Resource AgentResource Agent for Amazon AWS Elastic IP Addresses.
It manages AWS Elastic IP Addresses with awscli.
Credentials needs to be setup by running "aws configure".
See https://aws.amazon.com/cli/ for more information about awscli.
Resource options:
awscli: command line tools for aws services
profile: Valid AWS CLI profile name (see ~/.aws/config and 'aws configure')
elastic_ip (required): reserved elastic ip for ec2 instance
allocation_id (required): reserved allocation id for ec2 instance
private_ip_address: predefined private ip address for ec2 instance
api_delay: a short delay between API calls, to avoid sending API too quick

Starting Pacemaker:

Start pacemaker on all instances as super user:

# service pacemaker start

Resource Setup for the Cluster (Elastic IP):

On any node, run:

# pcs resource create <resource-name> ocf:heartbeat:awseip     elastic_ip="<Elastic IP Address>" allocation_id="<Alloction ID of ELasticIP>" awscli="$(which aws)"     op start   timeout="60s" interval="0s"  on-fail="restart"     op monitor timeout="60s" interval="10s" on-fail="restart"     op stop    timeout="60s" interval="0s"  on-fail="block"

Note:

  1. <resource-name>: example: elastic-ip
  2. <Alloction ID of ELasticIP>: Allocation ID of your created Elastic IP.

We can also use crm command for creating resource, both are applicable:

$ sudo crm configure property stonith-enabled=false$ sudo crm configure property no-quorum-policy=ignore$ sudo crm configure primitive elastic-ip ocf:heartbeat:awseip params elastic_ip="<Elastic IP Address>" awscli="$(which aws)" allocation_id="<Alloction ID of ELasticIP>" op start   timeout="60s" interval="0s"  on-fail="restart"     op monitor timeout="60s" interval="10s" on-fail="restart"     op stop    timeout="60s" interval="0s"  on-fail="block" meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"

As the result, we can view the resource by:

# pcs resource show
elastic-ip (ocf::heartbeat:awseip): Started <HostName of any node where ELastic IP is assignes to>
# pcs resource show elastic-ip
Resource: elastic-ip (class=ocf provider=heartbeat type=awseip)
Attributes: elastic_ip=xxx.xxx.xxx.xxx allocation_id=eipalloc-00bd4721 awscli=/usr/bin/aws
Operations: start interval=0s timeout=60s on-fail=restart (elastic-ip-start-interval-0s)
monitor interval=10s timeout=60s on-fail=restart (elastic-ip-monitor-interval-10s)
stop interval=0s timeout=60s on-fail=block (elastic-ip-stop-interval-0s)
# crm_mon -1
Last updated: Fri Oct 5 08:27:03 2018 Last change: Fri Oct 5 06:08:29 2018 by root via cibadmin on <HostName>
Stack: corosync
Current DC: <HostName> (version 1.1.14-70404b0) - partition with quorum
3 nodes and 1 resource configured
Online: [ <HostName1> <HostName2> <HostName3> ]elastic-ip (ocf::heartbeat:awseip): Started <HostName>

Checking pcs status or crm status:

# pcs status
Cluster name: HA
Last updated: Fri Oct 5 08:32:58 2018 Last change: Fri Oct 5 06:08:29 2018 by root via cibadmin on ip-172-31-14-243
Stack: corosync
Current DC: ip-172-31-13-151 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 1 resource configured
Online: [ ip-172-31-13-151 ip-172-31-14-243 ip-172-31-7-70 ]Full list of resources:elastic-ip (ocf::heartbeat:awseip): Started ip-172-31-13-151PCSD Status:
ip-172-31-14-243: Unable to authenticate
ip-172-31-7-70: Unable to authenticate
ip-172-31-13-151: Unable to authenticate
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Manual Elastic IP Failover demo:

We can move the resource (elastic-ip) from one node to another manually by:

# pcs resource move <Resource-name> <HostName>

You can verify the changes by running pcs statusor pcs resource show

Fail over of IP took 3–4 seconds.

Thanks for the interest, and All the Best for your implementation.

Let me know your questions regarding this note, by commenting below.

--

--