High Availability using Corosync + Pacemaker on Ubuntu 16.04

Lê Yên Thanh
Sep 17, 2017 · 6 min read

Once upon a time, I need to setup High Availability for my servers. I have 2 servers: 1 main server, let’s say A (with public IP, for example 1.0.0.1, private IP: 2.0.0.1) and 1 backup server, let’s say B (with public IP 1.0.0.2 , private IP: 2.0.0.2) and I have a public IP (1.0.0.3) which is used as the IP for my programmed APIs. Two servers are in the same private network.

Goal

Server A and B run with an active/passive configuration. Server A always take public IP (1.0.0.3), whenever server A is down, server B will take this public IP and become the main server.

Solution

After some researches, I decided to use Corosync and Pacemaker to setup the High Availability for my servers.

Corosync is an open source program that provides cluster membership and messaging capabilities, often referred to as the messaging layer, to client servers.

Pacemaker is an open source cluster resource manager (CRM), a system that coordinates resources and services that are managed and made highly available by a cluster. In essence, Corosync enables servers to communicate as a cluster, while Pacemaker provides the ability to control how the cluster behaves.

Synchronizing time betweenservers

Whenever you have multiple servers communicating with each other, especially with clustering software, it is important to ensure their clocks are synchronized. Let’s use NTP (Network Time Protocol) to synchronize our servers. On two servers, run those commands, select the same timezone on both servers:

Configure Firewall

Corosync uses UDP transport between ports 5404, 5405 and 5406 . If you are running a firewall, ensure that communication on those ports are allowed between the servers.

If you use ufw, you could allow traffic on these ports with these commands on both servers:

Or if you use iptables, you could allow traffic on these ports and eth1 (the private network interface) with these commands:

$ sudo iptables -A OUTPUT  -o eth1 -p udp -m multiport --sports 5404,5405,5406 -m conntrack --ctstate ESTABLISHED -j ACCEPT

Install Corosync and Pacemaker

Corosync is a dependency of Pacemaker, so we can install both of them using one command. Run this command on both servers:

Configure Authorization Key for two servers

Corosync must be configured so that our servers can communicate as a cluster.

On server A (main server), run these commands:

This will generate a 128-byte cluster authorization key, and write it to /etc/corosync/authkey on server A. Now we need to run this command on server A to copy the authkey to server B (backup server)

Then, on server B, run thoses commands:

Configure Corosync cluster

On both servers, open the corosync.conf and write the below scripts:

quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: server_A_private_IP_address
name: primary
nodeid: 1
}
node {
ring0_addr: server_B_private_IP_address
name: secondary
nodeid: 2
}
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
}

You can try to read the scripts and try to understand it. If you can’t, just forget about it :). There are only something that’s you need to remember:

  • server_A_private_IP_address: Private IP of server A
  • server_B_private_IP_address: Private IP of server B
  • private_binding_IP_address: The private IP that’s both server A and B are binding to). To know this address, just run ifconfig on server A (or server B) and take a look at the private interface (usually eth1), you will see something like below, the IP 2.0.0.255 is the value for private_binding_IP_address, because 2 server are running in the same private network, this value must be the same on both server:

Enable and run Corosync

Next, we need to configure Corosync to allow the Pacemaker service. On both servers, create the pcmk file in the Corosync’s service directory with below commands:

Then add this scripts to the pcmkfile

Finally, open file /etc/default/corosync and add this line (if there is already a line START=no, change it to YES as below)

Now, start Corosync on both server

Let’s check if everything is working ok with command:

This should output something like this (if not, wait 1 minute and run the command again):

Enable and Start Pacemaker

Pacemaker, which depends on the messaging capabilities of Corosync, is now ready to be started. On both servers, enable Pacemaker to start on system boot with this command:

Because Pacemaker need to start after Corosync, we set Pacemaker’s start priority to 20, which is higher than Corosync's (it’s 19 by default).

Now let’s start Pacemaker:

To interact with Pacemaker, we will use the crm utility. Check Pacemaker’s status:

This should output something like this (if not, wait for 30 seconds and run the command again):

Online: [ primary secondary ]

Configure Pacemaker and add our Public IP as a Resource

First we need to config some properties. We can run Pacemaker (crm) commands from either server, as it automatically synchronizes all cluster-related changes across all member nodes. Let’s try to run those commands on server A

Now we will add our public IP (1.0.0.3) as a Resource with this command:

NOTE: The config resource-stickiness=”100" means that’s whenever a server take the resource, our public IP (1.0.0.3), because the other server is down, it will take it forever even when the other server is online again.

Check the Pacemaker’s status again with command ‘sudo crm status’ you can see:

Online: [ primary secondary ]Full list of resources:virtual_public_ip   (ocf::heartbeat:IPaddr2):    Started primary

So we are having one resource running and the primary node (server A) is taking it. It means server A is handle our public IP (1.0.0.3). To double check this, try to run command:

You should see:

Testing, simulate the situation when server A going down

Now, we try to simulate the situation when server A is down, server B should take the public IP (1.0.0.3) in this case.

Of course you can shutdown server A, but if you really don’t want to shut it down, you can make the primary node become standby with command:

Let’s open server B and check pacemaker status with command ‘sudo crm status’ you should see:

Node primary: standby
Online: [ secondary ]
Full list of resources:virtual_public_ip (ocf::heartbeat:IPaddr2): Started secondary

Check the server B’s ip with:

You should see server B is now taking our public IP:

Now, to make the server A online again:

Because we set the resource-stickiness=”100" we need to make secondary node standby and online again to make primary node take our public IP again as default setting

Lê Yên Thanh

Written by

Want to know more about me? https://lythanh.xyz