Red Hat — Active/Passive High Availability — Apache

Jerome
15 min readAug 15, 2023

OCCASIONALLY I get to travel back in time, and tackle projects which take me back to my days of being a Linux system administrator.

Days when I often had to do “more” with less…

The POC, deploy a high available Active/Passive HA Apache cluster WITHOUT a load balancer or proper shared storage.

Fortunately with RHEL (Red Hat Enterprise Linux) and the “Red Hat Enterprise Linux High Availability Add-On”, we can make this happen.

But first — the why?:

Why Highly Available?
Well, we want our site to always be available — via multiple web server nodes, so as to avoid a single point of failure.

Why Active/Passive and not Active/Active?

We aren’t planning on serving static content, therefore we only want one node to serve/write content to the storage at a time.

Why via the RHEL HA add-on?

Because it gives us a method to load balance traffic (without a load balancer) and guarantee that resources are fenced (storage — so that only the primary node is making changes to it at any given time).

This avoids data level corruption/inconsistent data/and or file locks.

Why not simply involve HAProxy to do the load balancing and use something like NFS for storage?

Because we need a way to reliably fence our storage (and ensure only one node accesses it at the time).

Why not do storage via the Red Hat “Resilient Storage Add-On”?

Cost and added complexity — which we simply don’t need in this Active/Passive setup.

Then the how?:

For an Active/Passive configuration to work, we’ll need a few components.

  1. First we’ll need a VIP/DNS, where we’ll access our web application
  2. Second we’ll need multiple web server nodes (in our case 2) to achieve HA
  3. Third we’ll need shared storage, that can be isolated/locked to whichever node is “Active” at the time — so as to ensure that there’s no difference in data/corruption between the nodes.

This is illustrated by the below image:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_high_availability_clusters/assembly_configuring-active-passive-http-server-in-a-cluster-configuring-and-managing-high-availability-clusters
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_high_availability_clusters/assembly_configuring-active-passive-http-server-in-a-cluster-configuring-and-managing-high-availability-clusters

The HA inner workings in a nutshell:

The cluster VIP binds to a single node.
This node is elected as the “Active” node to serve traffic (by Pacemaker)

The “Active” node is then entrusted to host the VIP and receive incoming traffic, this task is performed by Keepalived and VRRP.

Once a failure occurs, the “Passive” node will be promoted to the “Active” node and start receiving traffic. Up to this point the “Passive” node is merely dormant and not receiving any web traffic.

The concepts we need to get there:

STONITH:
Shoot The Other Node In The Head — This is the method used to remove an unresponsive node from our cluster.

This means that when the “Active” node becomes unresponsive, STONITH kicks in and either reboots it or isolates it from the resources (VIP, storage, web content), so the “Passive” node can temporarily take over.

FENCING:
A mechanism to lock our resources (VIP, storage, web content) to our “Active” node.

This can be done as either a resource level fence — where we isolate resources by removing access to them on a node level (storage, network level fencing) OR it can be done as a node level fence — where we simply reboot the errant node, to stop it locking resources.

FENCING MECHANISMS:
These can differ vastly, but essentially a way to “ring fence” resources so they’re always only locked to our “Active” node.

In our case we’ll be using VMware as our hypervisor — so we’ll focus on the fence_vmware_rest plugin.

For a a full list of plugins/agents, you can see the below output:

# pcs stonith list
fence_amt_ws - Fence agent for AMT (WS)
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP
fence_bladecenter - Fence agent for IBM BladeCenter
fence_brocade - Fence agent for HP Brocade over telnet/ssh
fence_cisco_mds - Fence agent for Cisco MDS
fence_cisco_ucs - Fence agent for Cisco UCS
fence_drac5 - Fence agent for Dell DRAC CMC/5
fence_eaton_snmp - Fence agent for Eaton over SNMP
fence_emerson - Fence agent for Emerson over SNMP
fence_eps - Fence agent for ePowerSwitch
fence_heuristics_ping - Fence agent for ping-heuristic based fencing
fence_hpblade - Fence agent for HP BladeSystem
fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
fence_idrac - Fence agent for IPMI
fence_ifmib - Fence agent for IF MIB
fence_ilo - Fence agent for HP iLO
fence_ilo2 - Fence agent for HP iLO
fence_ilo3 - Fence agent for IPMI
fence_ilo3_ssh - Fence agent for HP iLO over SSH
fence_ilo4 - Fence agent for IPMI
fence_ilo4_ssh - Fence agent for HP iLO over SSH
fence_ilo5 - Fence agent for IPMI
fence_ilo5_ssh - Fence agent for HP iLO over SSH
fence_ilo_moonshot - Fence agent for HP Moonshot iLO
fence_ilo_mp - Fence agent for HP iLO MP
fence_ilo_ssh - Fence agent for HP iLO over SSH
fence_imm - Fence agent for IPMI
fence_intelmodular - Fence agent for Intel Modular
fence_ipdu - Fence agent for iPDU over SNMP
fence_ipmilan - Fence agent for IPMI
fence_ipmilanplus - Fence agent for IPMI
fence_kdump - fencing agent for use with kdump crash recovery service
fence_mpath - Fence agent for multipath persistent reservation
fence_redfish - I/O Fencing agent for Redfish
fence_rhevm - Fence agent for RHEV-M REST API
fence_rsa - Fence agent for IBM RSA
fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
fence_sbd - Fence agent for sbd
fence_scsi - Fence agent for SCSI persistent reservation
fence_tripplite_snmp - Fence agent for APC, Tripplite PDU over SNMP
fence_virt - Fence agent for virtual machines
fence_vmware_rest - Fence agent for VMware REST API
fence_vmware_soap - Fence agent for VMWare over SOAP API
fence_watchdog - Dummy watchdog fence agent
fence_wti - Fence agent for WTI
fence_xvm - Fence agent for virtual machines

The technical bits needed:

1x VIP/DNS

mywebapp.yoonix.xyz

3x RHEL 9.2 systems (1x Shared Storage node, 2x Apache web servers)

storage.yoonix.xyz
web1.yoonix.xyz
web2.yoonix.xyz
# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.2 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"

1x Shared Storage — In this case my home lab doesn’t have access to highly available enterprise storage, so I’m just going to spin up some quick and dirty ISCSI storage for the purposes of this demo.

The deployment:

Let’s register ALL of our nodes against a RH subscription — so we have access to updates / the required packages for HA.

[root@storage ~]# subscription-manager register --username your_user --password your_pass --auto-attach --force
Registering to: subscription.rhsm.redhat.com:443/subscription
The system has been registered with ID: 76950c20-53a9-4173-9b01-2a3f22c6df7b
The registered system name is: storage.yoonix.xyz
Installed Product Current Status:
Product Name: Red Hat Enterprise Linux for x86_64
Status: Subscribed

[root@web1 ~]# subscription-manager register --username your_user --password your_pass --auto-attach --force
Registering to: subscription.rhsm.redhat.com:443/subscription
The system has been registered with ID: 255a0c35-a4f0-4e41-9af4-eb92b4a91877
The registered system name is: web1.yoonix.xyz
Installed Product Current Status:
Product Name: Red Hat Enterprise Linux for x86_64
Status: Subscribed

[root@web2 ~]# subscription-manager register --username your_user --password your_pass --auto-attach --force
Registering to: subscription.rhsm.redhat.com:443/subscription
The system has been registered with ID: 89ddf3a1-16ce-4578-a590-99bb1472d86c
The registered system name is: web2.yoonix.xyz
Installed Product Current Status:
Product Name: Red Hat Enterprise Linux for x86_64
Status: Subscribed

Let’s update them:

[root@storage ~]# yum update -y
[root@web1 ~]# yum update -y
[root@web2 ~]# yum update -y

Next we’ll deploy our ISCSI storage target — which will become our shared storage host — this is done on our storage node:

[root@storage ~]# yum install targetcli -y
[root@storage ~]# systemctl enable --now target

[root@storage ~]# firewall-cmd --permanent --add-port=3260/tcp
[root@storage ~]# firewall-cmd --reload

[root@storage ~]# targetcli
Warning: Could not load preferences file /root/.targetcli/prefs.bin.
targetcli shell version 2.1.53
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> iscsi/
/iscsi> create
Created target iqn.2003-01.org.linux-iscsi.storage.x8664:sn.e6e826300d10.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.

/iscsi> cd /backstores/fileio
/backstores/fileio> create shared_storage /tmp/shared_storage.img 1G write_back=false
Created fileio shared_storage with size 1073741824

/backstores/fileio> cd /iscsi/iqn.2003-01.org.linux-iscsi.storage.x8664:sn.e6e826300d10/tpg1/luns
/iscsi/iqn.20...d10/tpg1/luns> create /backstores/fileio/shared_storage
Created LUN 0.

/iscsi/iqn.20...d10/tpg1/luns> exit
Global pref auto_save_on_exit=true
Configuration saved to /etc/target/saveconfig.json

Then we’ll install our ISCSI initiator on our web server nodes — so they can access the storage:

[root@web1 ~]# yum install iscsi-initiator-utils -y
Updating Subscription Management repositories.
Last metadata expiration check: 2:09:24 ago on Mon 24 Jul 2023 15:24:46.
Package iscsi-initiator-utils-6.2.1.4-3.git2a8f9d8.el9.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!

[root@web1 ~]# systemctl enable --now iscsid

[root@web2 ~]# yum install iscsi-initiator-utils -y
Updating Subscription Management repositories.
Last metadata expiration check: 2:09:24 ago on Mon 24 Jul 2023 15:24:46.
Package iscsi-initiator-utils-6.2.1.4-3.git2a8f9d8.el9.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!

[root@web2 ~]# systemctl enable --now iscsid

Then we’ll grab their initiator names

[root@web1 ~]# cat /etc/iscsi/initiatorname.iscsi 
InitiatorName=iqn.1994-05.com.redhat:c85d65d94abb

[root@web2 ~]# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1994-05.com.redhat:c883971b4c99

These will be added to a whitelist on the ISCSI target, which will allow these nodes to connect to/use the shared storage:

[root@storage ~]# targetcli
targetcli shell version 2.1.53
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/iscsi/iqn.20...d10/tpg1/luns> cd /iscsi/iqn.2003-01.org.linux-iscsi.storage.x8664:sn.e6e826300d10/tpg1/acls

/iscsi/iqn.20...d10/tpg1/acls> create iqn.1994-05.com.redhat:c85d65d94abb
Created Node ACL for iqn.1994-05.com.redhat:c85d65d94abb
Created mapped LUN 0.

/iscsi/iqn.20...d10/tpg1/acls> create iqn.1994-05.com.redhat:c883971b4c99
Created Node ACL for iqn.1994-05.com.redhat:c883971b4c99
Created mapped LUN 0.

Now we’ll connect from our ISCSI initiators (clients) to our ISCSI target (storage host):

[root@web1 ~]# iscsiadm -m discovery -t st -p storage.yoonix.xyz
192.168.13.11:3260,1 iqn.2003-01.org.linux-iscsi.storage.x8664:sn.e6e826300d10

[root@web2 ~]# iscsiadm -m discovery -t st -p storage.yoonix.xyz
192.168.13.11:3260,1 iqn.2003-01.org.linux-iscsi.storage.x8664:sn.e6e826300d10

Then we’ll reboot as a sanity check to make sure all of our services are enabled correctly, and indeed start at boot:

[root@storage ~]# reboot
[root@web1 ~]# reboot
[root@web2 ~]# reboot

Then we can confirm our storage is up (and in doing so that all services started correctly:

[root@web1 ~]# lsblk /dev/sdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdb 8:16 0 1G 0 disk

[root@web2 ~]# lsblk /dev/sdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdb 8:16 0 1G 0 disk

Next we’ll configure LVM to be able to use the storage and provide a unique name per node when accessing it:

[root@web1 ~]# vi /etc/lvm/lvm.conf
[root@web2 ~]# vi /etc/lvm/lvm.conf

system_id_source = "uname"

Then we’ll create our storage on our “Active” web server node and format it:

[root@web1 ~]# pvcreate /dev/sdb
Physical volume "/dev/sdb" successfully created.

[root@web1 ~]# vgcreate --setautoactivation n apache_vg /dev/sdb
Volume group "apache_vg" successfully created with system ID web1.yoonix.xyz

[root@web1 ~]# lvcreate --name apache_lv -l 100%FREE apache_vg
Logical volume "apache_lv" created.

[root@web1 ~]# mkfs.xfs /dev/apache_vg/apache_lv
meta-data=/dev/apache_vg/apache_lv isize=512 agcount=4, agsize=65024 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1
data = bsize=4096 blocks=260096, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=1566, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

We’ll then need to enable this device for use by LVM on our “Passive” node:

[root@web2 ~]# lvmdevices --adddev /dev/sdb

Next we’ll configure our Apache web server on both our web nodes:

[root@web1 ~]# yum install -y httpd

[root@web1 ~]# firewall-cmd --permanent --add-service=http
[root@web1 ~]# firewall-cmd --permanent --add-service=https
[root@web1 ~]# firewall-cmd --reload

[root@web1 ~]# cat <<-END > /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Require local
</Location>
END

[root@web1 ~]# /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true

[root@web1 ~]# /usr/bin/test -f /run/httpd.pid >/dev/null 2>/dev/null &&
/usr/bin/ps -q $(/usr/bin/cat /run/httpd.pid) >/dev/null 2>/dev/null &&
/usr/sbin/httpd -f /etc/httpd/conf/httpd.conf \
-c "PidFile /run/httpd.pid" -k graceful > /dev/null 2>/dev/null || true

---

[root@web2 ~]# firewall-cmd --permanent --add-service=http
[root@web2 ~]# firewall-cmd --permanent --add-service=https
[root@web2 ~]# firewall-cmd --reload

[root@web2 ~]# cat <<-END > /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Require local
</Location>
END

[root@web2 ~]# /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true

[root@web2 ~]# /usr/bin/test -f /run/httpd.pid >/dev/null 2>/dev/null &&
/usr/bin/ps -q $(/usr/bin/cat /run/httpd.pid) >/dev/null 2>/dev/null &&
/usr/sbin/httpd -f /etc/httpd/conf/httpd.conf \
-c "PidFile /run/httpd.pid" -k graceful > /dev/null 2>/dev/null || true

Then we’ll create some web content on our shared storage, and test our mount:

[root@web1 ~]# lvchange -ay apache_vg/apache_lv
[root@web1 ~]# mount /dev/apache_vg/apache_lv /var/www/
[root@web1 ~]# mkdir /var/www/html
[root@web1 ~]# mkdir /var/www/cgi-bin
[root@web1 ~]# mkdir /var/www/error
[root@web1 ~]# restorecon -R /var/www

[root@web1 ~]# cat <<-END >/var/www/html/index.html
<html>
<body>Hello</body>
</html>
END

[root@web1 ~]# umount /var/www

Now we can setup the HA components on our web nodes (pcs, pacemaker and corosync):

[root@web1 ~]# subscription-manager repos --enable rhel-9-for-x86_64-highavailability-rpms
Repository 'rhel-9-for-x86_64-highavailability-rpms' is enabled for this system.

[root@web1 ~]# yum install -y pcs pacemaker fence-agents-all pcp-zeroconf psmisc policycoreutils-python-utils lvm2 chrony iscsi-initiator-utils

[root@web1 ~]# timedatectl set-ntp yes

[root@web1 ~]# firewall-cmd --permanent --add-service=high-availability
[root@web1 ~]# firewall-cmd --reload

[root@web1 ~]# systemctl enable --now pcsd
Created symlink /etc/systemd/system/multi-user.target.wants/pcsd.service → /usr/lib/systemd/system/pcsd.service.

[root@web1 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

---

[root@web2 ~]# subscription-manager repos --enable rhel-9-for-x86_64-highavailability-rpms
Repository 'rhel-9-for-x86_64-highavailability-rpms' is enabled for this system.

[root@web2 ~]# yum install -y pcs pacemaker fence-agents-all pcp-zeroconf psmisc policycoreutils-python-utils lvm2 chrony iscsi-initiator-utils

[root@web2 ~]# timedatectl set-ntp yes

[root@web2 ~]# firewall-cmd --permanent --add-service=high-availability
[root@web2 ~]# firewall-cmd --reload

[root@web2 ~]# systemctl enable --now pcsd
Created symlink /etc/systemd/system/multi-user.target.wants/pcsd.service → /usr/lib/systemd/system/pcsd.service.

[root@web2 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Next we’ll start the cluster on our “Active” web node:

[root@web1 ~]# pcs host auth web1.yoonix.xyz web2.yoonix.xyz
Username: hacluster
Password:
web1.yoonix.xyz: Authorized
web2.yoonix.xyz: Authorized

[root@web1 ~]# pcs cluster setup apache_cluster --start web1.yoonix.xyz web2.yoonix.xyz
No addresses specified for host 'web1.yoonix.xyz', using 'web1.yoonix.xyz'
No addresses specified for host 'web2.yoonix.xyz', using 'web2.yoonix.xyz'
Destroying cluster on hosts: 'web1.yoonix.xyz', 'web2.yoonix.xyz'...
web2.yoonix.xyz: Successfully destroyed cluster
web1.yoonix.xyz: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'web1.yoonix.xyz', 'web2.yoonix.xyz'
web1.yoonix.xyz: successful removal of the file 'pcsd settings'
web2.yoonix.xyz: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'web1.yoonix.xyz', 'web2.yoonix.xyz'
web1.yoonix.xyz: successful distribution of the file 'corosync authkey'
web1.yoonix.xyz: successful distribution of the file 'pacemaker authkey'
web2.yoonix.xyz: successful distribution of the file 'corosync authkey'
web2.yoonix.xyz: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'web1.yoonix.xyz', 'web2.yoonix.xyz'
web1.yoonix.xyz: successful distribution of the file 'corosync.conf'
web2.yoonix.xyz: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'web1.yoonix.xyz', 'web2.yoonix.xyz'...

[root@web1 ~]# pcs cluster enable --all
web1.yoonix.xyz: Cluster Enabled
web2.yoonix.xyz: Cluster Enabled

We can then double check that all members have joined the cluster:

[root@web1 ~]# corosync-cmapctl | grep members
runtime.members.1.config_version (u64) = 0
runtime.members.1.ip (str) = r(0) ip(192.168.13.13)
runtime.members.1.join_count (u32) = 1
runtime.members.1.status (str) = joined
runtime.members.2.config_version (u64) = 0
runtime.members.2.ip (str) = r(0) ip(192.168.13.17)
runtime.members.2.join_count (u32) = 1
runtime.members.2.status (str) = joined

[root@web1 ~]# pcs status
Cluster name: apache_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Status of pacemakerd: 'Pacemaker is running' (last updated 2023-07-25 10:09:05 +02:00)
Cluster Summary:
* Stack: corosync
* Current DC: web1.yoonix.xyz (version 2.1.5-9.el9_2-a3f44794f94) - partition with quorum
* Last updated: Tue Jul 25 10:09:06 2023
* Last change: Tue Jul 25 10:07:20 2023 by hacluster via crmd on web1.yoonix.xyz
* 2 nodes configured
* 0 resource instances configured

Node List:
* Online: [ web1.yoonix.xyz web2.yoonix.xyz ]

Full List of Resources:
* No resources

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Finally we can create our resources (LVM, FS, VIP, Apache) that pcs manages / monitors in our HA cluster — we’ll do this from our “Active” node:

[root@web1 ~]# pcs resource create my_lvm ocf:heartbeat:LVM-activate vgname=apache_vg vg_access_mode=system_id --group apachegroup

[root@web1 ~]# pcs resource create my_fs Filesystem device="/dev/apache_vg/apache_lv" directory="/var/www" fstype="xfs" --group apachegroup
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')

[root@web1 ~]# pcs resource create VirtualIP IPaddr2 ip=192.168.13.20 cidr_netmask=24 --group apachegroup
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')

[root@web1 ~]# pcs resource create Website apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status" --group apachegroup
Assumed agent name 'ocf:heartbeat:apache' (deduced from 'apache')

We’re almost done now, we just need our fencing mechanism to be in place, before we can test failover — so we’ll test connectivity to vSphere next:

[root@web1 ~]# fence_vmware_rest -a yourvcenter.com -l your_vmware_user -p your_vmware_pass -z -o list | egrep "(web1|web2)"
web1,
web2

NOTE:
"web1" and "web2" are my VM names within vCenter.

We can also confirm VM power state:

[root@web1 ~]# fence_vmware_rest -a yourvcenter.com -l your_vmware_user -p your_vmware_pass -z -o status -n web1
Status: ON

Then we’ll create our STONITH fence and confirm all resources are working / the fence is working:

[root@web1 ~]# pcs stonith create vmfence fence_vmware_rest pcmk_host_map="web1.yoonix.xyz:web1;web2.yoonix.xyz:web2" ip=yourvcenter.com ssl=1 username=your_vmware_user password=your_vmware_pass
[root@web1 ~]# pcs status
Cluster name: apache_cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-07-25 10:25:06 +02:00)
Cluster Summary:
* Stack: corosync
* Current DC: web1.yoonix.xyz (version 2.1.5-9.el9_2-a3f44794f94) - partition with quorum
* Last updated: Tue Jul 25 10:25:07 2023
* Last change: Tue Jul 25 10:25:01 2023 by root via cibadmin on web1.yoonix.xyz
* 2 nodes configured
* 5 resource instances configured

Node List:
* Online: [ web1.yoonix.xyz web2.yoonix.xyz ]

Full List of Resources:
* Resource Group: apachegroup:
* my_lvm (ocf:heartbeat:LVM-activate): Started web1.yoonix.xyz
* my_fs (ocf:heartbeat:Filesystem): Started web1.yoonix.xyz
* VirtualIP (ocf:heartbeat:IPaddr2): Started web1.yoonix.xyz
* Website (ocf:heartbeat:apache): Started web1.yoonix.xyz
* vmfence (stonith:fence_vmware_rest): Started web2.yoonix.xyz

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

We’ll check resources and see that only our “Active” web server is serving traffic (both directly and via the VIP):

[root@web1 ~]# curl mywebapp.yoonix.xyz
<html>
<body>Hello</body>
</html>

[root@web1 ~]# curl web1.yoonix.xyz
<html>
<body>Hello</body>
</html>

[root@web1 ~]# curl web2.yoonix.xyz
curl: (7) Failed to connect to web2.yoonix.xyz port 80: Connection refused

Then we’ll test shutting down our “Active” web node to simulate an outage. The “Passive” node will then take over (until it in turn one day becomes unresponsive and hands the reigns back).

[root@web1 ~]# init 0
[root@web1 ~]# Connection to web1.yoonix.xyz closed by remote host.

[root@web2 ~]# pcs status
Cluster name: apache_cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-07-25 11:09:52 +02:00)
Cluster Summary:
* Stack: corosync
* Current DC: web2.yoonix.xyz (version 2.1.5-9.el9_2-a3f44794f94) - partition with quorum
* Last updated: Tue Jul 25 11:09:53 2023
* Last change: Tue Jul 25 10:47:13 2023 by hacluster via crmd on web2.yoonix.xyz
* 2 nodes configured
* 5 resource instances configured

Node List:
* Online: [ web2.yoonix.xyz ]
* OFFLINE: [ web1.yoonix.xyz ]

Full List of Resources:
* Resource Group: apachegroup:
* my_lvm (ocf:heartbeat:LVM-activate): Started web2.yoonix.xyz
* my_fs (ocf:heartbeat:Filesystem): Started web2.yoonix.xyz
* VirtualIP (ocf:heartbeat:IPaddr2): Started web2.yoonix.xyz
* Website (ocf:heartbeat:apache): Started web2.yoonix.xyz
* vmfence (stonith:fence_vmware_rest): Started web2.yoonix.xyz

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

--

[root@web2 ~]# curl mywebapp.yoonix.xyz
<html>
<body>Hello</body>
</html>

[root@web2 ~]# curl web1.yoonix.xyz
curl: (7) Failed to connect to web1.yoonix.xyz port 80: No route to host

[root@web2 ~]# curl web2.yoonix.xyz
<html>
<body>Hello</body>
</html>

As we can see from the output above our “Passive” node has automatically become our “Active” node.

When we switch our web1 node back on, we’ll find that it doesn’t resume it’s previous “Active” role, it’s remains “Passive” (until needed).

[root@web2 ~]# pcs status
Cluster name: apache_cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-07-25 11:12:25 +02:00)
Cluster Summary:
* Stack: corosync
* Current DC: web2.yoonix.xyz (version 2.1.5-9.el9_2-a3f44794f94) - partition with quorum
* Last updated: Tue Jul 25 11:12:26 2023
* Last change: Tue Jul 25 10:47:13 2023 by hacluster via crmd on web2.yoonix.xyz
* 2 nodes configured
* 5 resource instances configured

Node List:
* Online: [ web1.yoonix.xyz web2.yoonix.xyz ]

Full List of Resources:
* Resource Group: apachegroup:
* my_lvm (ocf:heartbeat:LVM-activate): Started web2.yoonix.xyz
* my_fs (ocf:heartbeat:Filesystem): Started web2.yoonix.xyz
* VirtualIP (ocf:heartbeat:IPaddr2): Started web2.yoonix.xyz
* Website (ocf:heartbeat:apache): Started web2.yoonix.xyz
* vmfence (stonith:fence_vmware_rest): Started web2.yoonix.xyz

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

We’ve simulated a power outage above, and all seems good — but what happens if our “Active” node “hangs” — will HA be able to fence/reboot it as needed?

As you can see below, we fenced the web2 node, forced HA to connect to VMWare and issue a power off/power on event.

web1 is now serving the content.

[root@web2 ~]# uptime
11:16:14 up 11 min, 1 user, load average: 0.00, 0.05, 0.05

[root@web1 ~]# pcs stonith fence
Node: web2 fenced

[root@web1 ~]# curl mywebapp.yoonix.xyz
<html>
<body>Hello</body>
</html>

[root@web1 ~]# curl web1.yoonix.xyz
<html>
<body>Hello</body>
</html>

[root@web1 ~]# curl web2.yoonix.xyz
curl: (7) Failed to connect to web2.yoonix.xyz port 80: No route to host

[root@web2 ~]# uptime
11:17:22 up 0 min, 1 user, load average: 0.57, 0.14, 0.05

Additionally web1 is now the “Active” node:

[root@web1 ~]# pcs status
Cluster name: apache_cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-07-25 11:39:57 +02:00)
Cluster Summary:
* Stack: corosync
* Current DC: web1.yoonix.xyz (version 2.1.5-9.el9_2-a3f44794f94) - partition with quorum
* Last updated: Tue Jul 25 11:39:57 2023
* Last change: Tue Jul 25 10:47:13 2023 by hacluster via crmd on web2.yoonix.xyz
* 2 nodes configured
* 5 resource instances configured

Node List:
* Online: [ web1.yoonix.xyz web2.yoonix.xyz ]

Full List of Resources:
* Resource Group: apachegroup:
* my_lvm (ocf:heartbeat:LVM-activate): Started web1.yoonix.xyz
* my_fs (ocf:heartbeat:Filesystem): Started web1.yoonix.xyz
* VirtualIP (ocf:heartbeat:IPaddr2): Started web1.yoonix.xyz
* Website (ocf:heartbeat:apache): Started web1.yoonix.xyz
* vmfence (stonith:fence_vmware_rest): Started web1.yoonix.xyz

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Caveats and concerns:

My advice would be to check the supportability policies for RH HA thoroughly before deploying — as there are some caveats to first consider.

Namely things like:

  1. Which STONITH devices are supported (as an example no Software-Emulated Watchdog devices are supported).
  2. Which hypervisors are supported (as an example HyperV has no API and thus can’t be supported for HA, as there is no reliable fencing mechanism).
  3. Which versions of your hypervisor are supported (as an example, at the time of writing this, VMware vSphere 8 is not supported due to some outstanding bugs).

Additionally you should consider factors that may influence how you fail over. If you have a node preference when doing so, or if you have an order in which resources need to start — then you may want to consider looking at cluster resource constraints.

These aren’t concerns in our small cluster, but as you scale they could be.

Final thoughts:

Whilst a bit more old school and not following on the trend of containers and container orchestration, hopefully this has shown you, that you can have a basic HA cluster even if some components like a dedicated load balancer are lacking.

--

--