HA Cluster with ElasticIP using Corosync and Pacemaker
Lets build a HA Cluster setup of 3 nodes on AWS infrastructure using Corosync and Pacemaker, and have a floatingIP using ElasticIP in AWS with detailed steps.
Corosync is an open source program that provides cluster membership and messaging capabilities, often referred to as the messaging layer, to client servers.
Pacemaker is an open source cluster resource manager (CRM), a system that coordinates resources and services that are managed and made highly available by a cluster. In essence, Corosync enables servers to communicate as a cluster, while Pacemaker provides the ability to control how the cluster behaves.
Prerequisite:
- Three EC2 instances created on AWS. (As you know already, each instances has their own hostname, private-IP, and public-IP.
- Need an unassigned ElasticIP created in AWS.
- Need login as root user on all three instances. (Able to do su -)
- Configure AWS settings by running
aws configure
(as root user *very important*) on all instances. - Better to have an open security group assigned to those instances, till we finish our setup, then have it closed later.
Installation:
On all 3 instances run the following commands:
Installation of both corosync and pacemaker can be achieved by a single command, as corosync is dependent on pacemaker:
$ sudo apt-get install pacemaker
We need crmsh or pcs for configuring various cluster functionalities, and resources. So verify their presence on your instances by checking their versions.
crmsh is a cluster management shell for the Pacemaker High Availability stack.(for more info: https://crmsh.github.io/).
pcs — pacemaker/corosync configuration system.
# pacemakerd --version
Pacemaker 1.1.14
Written by Andrew Beekhof# corosync -v
Corosync Cluster Engine, version '2.3.5'
Copyright (c) 2006-2009 Red Hat, Inc.# pcs --version
0.9.149# crm --version
crm 2.2.0
Corosync Configuration:
Create the corosync configuration file and dump the following content on our three instances:
# cat /etc/corosync/corosync.conftotem {
version: 2
crypto_cipher: none
crypto_hash: none
cluster_name: HA # Any name for the cluster
rrp_mode: active
transport: udpu
token: 10000
}quorum {
provider: corosync_votequorum
}nodelist {
node {
ring0_addr: <Hostname of Node1>
name: master1
nodeid: 1
}
node {
ring0_addr: <Hostname of Node2>
name: master2
nodeid: 2
}
node {
ring0_addr: <Hostname of Node3>
name: master3
nodeid: 3
}
}service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}logging {
fileline: off
to_logfile: yes
to_syslog: yes
debug: on
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys
{
subsys: AMF
debug: off
}
}
Note:
- <Hostname of Node> is the output of the command “uname -n” on corresponding hosts or the PrivateDNS name of the EC2 instance in AWS (not FQDN).
- We can also provide privateIP address of the instances instead of hostnames, it works but we can expect a warning while running “pcs status” saying: “WARNING: corosync and pacemaker node names do not match (IPs used in setup?)” which is ignorable.
(In order to understand the configuration, you can go through following references):
- https://github.com/corosync/corosync/blob/master/man/corosync.conf.5
- https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_enabling_pacemaker.html
Starting Corosync:
Start corosync on all instances , and check the ring status (as super user):
# service corosync start# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = <>
status = ring 0 active with no faults
Once corosync has been started successfully on all the instances, check the membership of the cluster as follows:
# corosync-cmapctl | grep member
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(<node1-privateip>)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(<node2-privateip>)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(<node3-privateip>)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined# pcs status corosyncMembership information
----------------------
Nodeid Votes Name
2 1 <Hostname of Node2>
3 1 <Hostname of Node3>
1 1 <Hostname of Node1> (local)
Pacemaker Configuration:
Being a Cluster resource manager, Pacemaker needs the resources (here, ElasticIP) well defined to it.
For handling ElasticIP as a resource, we need to use ‘ocf:heartbeat:awseip’, so the resource handling protocol should be there in place:
If the file: /usr/lib/ocf/resource.d/heartbeat/awseip
doesn’t exist in our instances, then grab it from: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/awseip
And make sure file has root ownership and appropriate access.
Now add, AWS_DEFAULT_REGION=<AWS-Default-Region>
at the end of /etc/systemd/system/multi-user.target.wants/pacemaker.service
Thus, the completion of pacemaker configuration for ElasticIP resource can be verified by:
# pcs resource describe ocf:heartbeat:awseipocf:heartbeat:awseip - Amazon AWS Elastic IP Address Resource AgentResource Agent for Amazon AWS Elastic IP Addresses.
It manages AWS Elastic IP Addresses with awscli.
Credentials needs to be setup by running "aws configure".
See https://aws.amazon.com/cli/ for more information about awscli.Resource options:
awscli: command line tools for aws services
profile: Valid AWS CLI profile name (see ~/.aws/config and 'aws configure')
elastic_ip (required): reserved elastic ip for ec2 instance
allocation_id (required): reserved allocation id for ec2 instance
private_ip_address: predefined private ip address for ec2 instance
api_delay: a short delay between API calls, to avoid sending API too quick
Starting Pacemaker:
Start pacemaker on all instances as super user:
# service pacemaker start
Resource Setup for the Cluster (Elastic IP):
On any node, run:
# pcs resource create <resource-name> ocf:heartbeat:awseip elastic_ip="<Elastic IP Address>" allocation_id="<Alloction ID of ELasticIP>" awscli="$(which aws)" op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="10s" on-fail="restart" op stop timeout="60s" interval="0s" on-fail="block"
Note:
- <resource-name>: example: elastic-ip
- <Alloction ID of ELasticIP>: Allocation ID of your created Elastic IP.
We can also use crm command for creating resource, both are applicable:
$ sudo crm configure property stonith-enabled=false$ sudo crm configure property no-quorum-policy=ignore$ sudo crm configure primitive elastic-ip ocf:heartbeat:awseip params elastic_ip="<Elastic IP Address>" awscli="$(which aws)" allocation_id="<Alloction ID of ELasticIP>" op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="10s" on-fail="restart" op stop timeout="60s" interval="0s" on-fail="block" meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"
As the result, we can view the resource by:
# pcs resource show
elastic-ip (ocf::heartbeat:awseip): Started <HostName of any node where ELastic IP is assignes to># pcs resource show elastic-ip
Resource: elastic-ip (class=ocf provider=heartbeat type=awseip)
Attributes: elastic_ip=xxx.xxx.xxx.xxx allocation_id=eipalloc-00bd4721 awscli=/usr/bin/aws
Operations: start interval=0s timeout=60s on-fail=restart (elastic-ip-start-interval-0s)
monitor interval=10s timeout=60s on-fail=restart (elastic-ip-monitor-interval-10s)
stop interval=0s timeout=60s on-fail=block (elastic-ip-stop-interval-0s)# crm_mon -1
Last updated: Fri Oct 5 08:27:03 2018 Last change: Fri Oct 5 06:08:29 2018 by root via cibadmin on <HostName>
Stack: corosync
Current DC: <HostName> (version 1.1.14-70404b0) - partition with quorum
3 nodes and 1 resource configuredOnline: [ <HostName1> <HostName2> <HostName3> ]elastic-ip (ocf::heartbeat:awseip): Started <HostName>
Checking pcs status or crm status:
# pcs status
Cluster name: HA
Last updated: Fri Oct 5 08:32:58 2018 Last change: Fri Oct 5 06:08:29 2018 by root via cibadmin on ip-172-31-14-243
Stack: corosync
Current DC: ip-172-31-13-151 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 1 resource configuredOnline: [ ip-172-31-13-151 ip-172-31-14-243 ip-172-31-7-70 ]Full list of resources:elastic-ip (ocf::heartbeat:awseip): Started ip-172-31-13-151PCSD Status:
ip-172-31-14-243: Unable to authenticate
ip-172-31-7-70: Unable to authenticate
ip-172-31-13-151: Unable to authenticateDaemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Manual Elastic IP Failover demo:
We can move the resource (elastic-ip) from one node to another manually by:
# pcs resource move <Resource-name> <HostName>
You can verify the changes by running pcs status
or pcs resource show
Fail over of IP took 3–4 seconds.
Thanks for the interest, and All the Best for your implementation.
Let me know your questions regarding this note, by commenting below.