Build a 6-Node Fault-Tolerant Multi-AD Redis Cluster on OCI
In this article we will deploy a Multi Master 6-Node Highly Available and Fault-tolerant Redis Cluster on OCI. Redis clustering provides a way to distribute data across multiple Redis instances to achieve scalability, high availability, and fault tolerance. Each Redis cluster consists of multiple master and replica (slave) nodes.
Introduction
Redis is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indices. [1]
Redis Clustering Overview:
- Redis clustering uses a concept called cluster sharding. It divides the dataset into multiple hash slots (16384 slots in total). Each key in Redis belongs to one of these slots based on its hash value.
- The cluster consists of multiple master nodes, each responsible for a subset of hash slots, and associated replica (slave) nodes that replicate the data from the master nodes.
- Redis uses a gossip protocol to maintain cluster state information, detect node failures, and handle reconfiguration automatically.
Redis Cluster High Availability and Fault Tolerance:
- Master Nodes: Each master node in the Redis cluster is responsible for specific hash slots. The data is distributed across the masters based on the hash slot mapping.
- Slave Nodes: Each master has one or more replica (slave) nodes. Replicas replicate data from their corresponding master. If a master fails, one of its replicas is promoted to a new master to ensure high availability and fault tolerance.
- Automatic Failover: When a master node fails, the cluster uses the gossip protocol to detect the failure. It promotes a replica to a master and updates the cluster’s state. Clients are redirected to the new master automatically.
- Hash Slot Resharding: Redis allows for dynamic cluster resharding. You can add or remove nodes, and the cluster will automatically redistribute the hash slots to ensure data balancing across the cluster.
Architecture:
We will create this architecture using a Multi-AD region on OCI with a multi-AD VCN. Each AD will host one Master and one slave pair of Redis cache. We will use a Public Network load balancer with 6 backend sets to connect to our Redis cluster from our application
Pre-requisites:
- OCI Tenancy with IAM permissions to create and manage OCI VMs and VCN’s
- OCI VCN is already created and with multi-AD subnets
- Port 6379 is whitelisted in the Network security list of the private subnet and port 7000 is whitelisted in the Network security list of the public subnet
- Basic working knowledge of Redis and redis-cli
- Bastion Host is installed in Public subnet with access to all the Redis nodes
# AD Spanning Redis Nodes
redis-master-1: 10.180.2.115 - AD1
redis-master-2: 10.180.2.119 - AD2
redis-master-3: 10.180.2.189 - AD3
redis-slave-1 : 10.180.2.154 - AD1
redis-slave-2 : 10.180.2.241 - AD2
redis-slave-3 : 10.180.2.110 - AD3
Important Note : Port 6379 should is whitelisted in the Network security list of the private subnet and port 7000 should be whitelisted in the Network security list of the public subnet.
Setting Up a Multi-AD Redis Cluster :
Create 6 VM’s with Oracle Linux 8 in different Availability Domains of your OCI tenancy in a multi-AD region of your choice. In this example I’m using us-ashburn-1
$ sudo vi /etc/hosts
## Redis Nodes ##
10.180.2.115 redis-master-1
10.180.2.119 redis-master-2
10.180.2.189 redis-master-3
10.180.2.154 redis-slave-1
10.180.2.241 redis-slave-2
10.180.2.110 redis-slave-3
Step 1. Run the below commands on all the above Redis VMs, both master and slave nodes.
$ sudo dnf install https://rpms.remirepo.net/enterprise/remi-release-8.rpm -y
$ sudo dnf module install redis:remi-6.2 -y
$ sudo systemctl enable redis.service
$ sudo firewall-cmd --permanent --add-port=6379/tcp
$ sudo firewall-cmd --permanent --add-port=16379/tcp
$ sudo firewall-cmd --reload
$ sudo firewall-cmd --list-ports
I’ve created a simple shell script to execute this remotely from the bastion host on all nodes saving you sometime. Create 2 files redis_install.sh and hosts.txt on the bastion host where you added the entries in /etc/hosts
- redis_install.sh
#!/bin/bash
# Read the hosts from the file
while IFS= read -r host; do
# Extract the IP address and hostname from the line
ip=$(echo "$host" | awk '{print $1}')
hostname=$(echo "$host" | awk '{print $2}')
echo "Executing commands on $hostname ($ip)"
# SSH into the remote host and execute the commands
ssh "$ip" <<EOF
echo "Installing remi-release..."
sudo dnf install https://rpms.remirepo.net/enterprise/remi-release-8.rpm -y
echo "Installing Redis module..."
sudo dnf module install redis:remi-6.2 -y
echo "Enabling Redis service..."
sudo systemctl enable redis.service
echo "Adding Redis Port to Firewall..."
sudo firewall-cmd --permanent --add-port=6379/tcp
sudo firewall-cmd --permanent --add-port=16379/tcp
sudo firewall-cmd --reload
sudo firewall-cmd --list-ports
echo "Commands executed successfully on $hostname ($ip)"
EOF
echo "--------------------------------------------------"
done < hosts.txt
- hosts.txt
10.180.2.115 redis-master-1
10.180.2.119 redis-master-2
10.180.2.189 redis-master-3
10.180.2.154 redis-slave-1
10.180.2.241 redis-slave-2
10.180.2.110 redis-slave-3
- Execute redis_install.sh
$ chmod +x redis_install.sh
$ ./redis_install.sh
This will install Redis on all the master and slaves nodes
Step 2. Configure your Redis instances configuration file by editing /etc/redis.conf and change the below parameters on all the master and slave nodes:
Make sure to comment out the entries and change the below parameters
$ sudo vi /etc/redis.conf
bind 0.0.0.0 #this will bind the Redis listening IP to all the interfaces on the host
protected-mode no
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 15000
appendonly yes
For more information refer : https://severalnines.com/blog/installing-redis-cluster-cluster-mode-enabled-auto-failover/
Start the Redis service
sudo systemctl start redis.service
Check Redis port is up and listening on 6379
$ netstat -an | grep 6379
tcp 0 0 0.0.0.0:16379 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN
Check for any errors in file /var/log/redis/redis.log
$ sudo less /var/log/redis/redis.log
Step 3. Start the Redis Cluster from any one node
Remote login to all the nodes from any one node to check before proceeding with the cluster creation
[opc@redis-master-1 ~]$ redis-cli -h 10.180.2.115
10.180.2.115:6379> exit
[opc@redis-master-1 ~]$ redis-cli -h 10.180.2.119
10.180.2.119:6379> exit
[opc@redis-master-1 ~]$ redis-cli -h 10.180.2.189
10.180.2.189:6379> exit
[opc@redis-master-1 ~]$ redis-cli -h 10.180.2.154
10.180.2.154:6379> exit
[opc@redis-master-1 ~]$ redis-cli -h 10.180.2.241
10.180.2.241:6379> exit
[opc@redis-master-1 ~]$ redis-cli -h 10.180.2.110
10.180.2.110:6379> exit
Important Note : First 3 nodes will be the master nodes, and the rest will be the slave node. The cluster-replicas parameter value 1 means there is one slave node for each master node.
## Add a Redis Cluster with 3 master nodes and 3 slave nodes
$ redis-cli --cluster create 10.180.2.115:6379 10.180.2.119:6379 10.180.2.189:6379 10.180.2.154:6379 10.180.2.241:6379 10.180.2.110:6379 --cluster-replicas 1
Output :
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.180.2.241:6379 to 10.180.2.115:6379
Adding replica 10.180.2.110:6379 to 10.180.2.119:6379
Adding replica 10.180.2.154:6379 to 10.180.2.189:6379
M: 78117539772f3920028ecfa4795b8e77914abade 10.180.2.115:6379
slots:[0-5460] (5461 slots) master
M: 8630a21874aaa07fd1e55faac79fe94db45fe2d6 10.180.2.119:6379
slots:[5461-10922] (5462 slots) master
M: d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de 10.180.2.189:6379
slots:[10923-16383] (5461 slots) master
S: 58e57e98fb7855feb9b1c00c2778085f403d2f44 10.180.2.154:6379
replicates d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de
S: 315ff636b0a278531521545acde54a1141f356a3 10.180.2.241:6379
replicates 78117539772f3920028ecfa4795b8e77914abade
S: 0951e2f4bcbfce0fd00190d9b570f4d41d162508 10.180.2.110:6379
replicates 8630a21874aaa07fd1e55faac79fe94db45fe2d6
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node 10.180.2.115:6379)
M: 78117539772f3920028ecfa4795b8e77914abade 10.180.2.115:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: 0951e2f4bcbfce0fd00190d9b570f4d41d162508 10.180.2.110:6379
slots: (0 slots) slave
replicates 8630a21874aaa07fd1e55faac79fe94db45fe2d6
S: 315ff636b0a278531521545acde54a1141f356a3 10.180.2.241:6379
slots: (0 slots) slave
replicates 78117539772f3920028ecfa4795b8e77914abade
S: 58e57e98fb7855feb9b1c00c2778085f403d2f44 10.180.2.154:6379
slots: (0 slots) slave
replicates d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de
M: 8630a21874aaa07fd1e55faac79fe94db45fe2d6 10.180.2.119:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
M: d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de 10.180.2.189:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Check Status of Cluster :
# Check overall status of cluster :
$ redis-cli -h 10.180.2.115 -p 6379 cluster nodes
0951e2f4bcbfce0fd00190d9b570f4d41d162508 10.180.2.110:6379@16379 slave 8630a21874aaa07fd1e55faac79fe94db45fe2d6 0 1689601070000 2 connected
315ff636b0a278531521545acde54a1141f356a3 10.180.2.241:6379@16379 slave 78117539772f3920028ecfa4795b8e77914abade 0 1689601069000 1 connected
78117539772f3920028ecfa4795b8e77914abade 10.180.2.115:6379@16379 myself,master - 0 1689601069000 1 connected 0-5460
58e57e98fb7855feb9b1c00c2778085f403d2f44 10.180.2.154:6379@16379 slave d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de 0 1689601071742 3 connected
8630a21874aaa07fd1e55faac79fe94db45fe2d6 10.180.2.119:6379@16379 master - 0 1689601070739 2 connected 5461-10922
d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de 10.180.2.189:6379@16379 master - 0 1689601070000 3 connected 10923-16383
# Check status of Master Nodes :
$ redis-cli -h 10.180.2.115 -p 6379 cluster nodes | grep master
78117539772f3920028ecfa4795b8e77914abade 10.180.2.115:6379@16379 myself,master - 0 1689601117000 1 connected 0-5460
8630a21874aaa07fd1e55faac79fe94db45fe2d6 10.180.2.119:6379@16379 master - 0 1689601120905 2 connected 5461-10922
d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de 10.180.2.189:6379@16379 master - 0 1689601118000 3 connected 10923-16383
# Check status of Slave Nodes :
$ redis-cli -h 10.180.2.115 -p 6379 cluster nodes | grep slave
0951e2f4bcbfce0fd00190d9b570f4d41d162508 10.180.2.110:6379@16379 slave 8630a21874aaa07fd1e55faac79fe94db45fe2d6 0 1689601134000 2 connected
315ff636b0a278531521545acde54a1141f356a3 10.180.2.241:6379@16379 slave 78117539772f3920028ecfa4795b8e77914abade 0 1689601135951 1 connected
58e57e98fb7855feb9b1c00c2778085f403d2f44 10.180.2.154:6379@16379 slave d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de 0 1689601135000 3 connected
Test Failover:
On Master 3– 10.180.2.189:
$ sudo systemctl stop redis
Now, let’s check the output of the cluster command from another node :
$ redis-cli -h 10.180.2.115 -p 6379 cluster nodes
0951e2f4bcbfce0fd00190d9b570f4d41d162508 10.180.2.110:6379@16379 slave 8630a21874aaa07fd1e55faac79fe94db45fe2d6 0 1689601577000 2 connected
315ff636b0a278531521545acde54a1141f356a3 10.180.2.241:6379@16379 slave 78117539772f3920028ecfa4795b8e77914abade 0 1689601577000 1 connected
78117539772f3920028ecfa4795b8e77914abade 10.180.2.115:6379@16379 myself,master - 0 1689601576000 1 connected 0-5460
58e57e98fb7855feb9b1c00c2778085f403d2f44 10.180.2.154:6379@16379 master - 0 1689601578481 7 connected 10923-16383
8630a21874aaa07fd1e55faac79fe94db45fe2d6 10.180.2.119:6379@16379 master - 0 1689601577475 2 connected 5461-10922
d07b492d3d6d1ae05ad3f0c4b7c57d6945ad22de 10.180.2.189:6379@16379 master,fail - 1689601534233 1689601531218 3 disconnected
We can see that the redis-master-3 node with IP 10.180.2.189 is showing status disconnected and it’s slave with IP 10.180.2.154 has been promoted as a master.
Step 4. Create a public network load balancer to front the Redis cluster
OCI Network Load balancer can be used to distribute traffic efficiently within your Redis cluster.
A. Create OCI NLB with your VCN and Public Subnet
B. Add Listener over port tcp 7000
C. Add Backends
D. Add all the Redis Master-Slave nodes to the backend set over tcp port 6379
Specify health check over tcp port 6379 for the backend set
Once the NLB is created check the status of the backend sets and connect from redis-cli
Connect with redis-cli and check.
$ redis-cli -h 158.x.x.x -p 7000 cluster nodes | grep myself
315ff636b0a278531521545acde54a1141f356a3 10.180.2.241:6379@16379 myself,slave 78117539772f3920028ecfa4795b8e77914abade 0 1689606894000 1 connected
$ redis-cli -h 158.x.x.x -p 7000 cluster nodes | grep myself
78117539772f3920028ecfa4795b8e77914abade 10.180.2.115:6379@16379 myself,master - 0 1689606892000 1 connected
$ redis-cli -h 158.x.x.x -p 7000 cluster nodes | grep myself
315ff636b0a278531521545acde54a1141f356a3 10.180.2.241:6379@16379 myself,slave 78117539772f3920028ecfa4795b8e77914abade 0 1689606898000 1 connected
$ redis-cli -h 158.x.x.x -p 7000 cluster nodes | grep myself
58e57e98fb7855feb9b1c00c2778085f403d2f44 10.180.2.154:6379@16379 myself,master - 0 1689606990000 7 connected
Note : Everytime you make a connection to the public IP of the NLB it will connect to a new node based as the NLB uses 5-tuple hash policy to cycle the connections. Ideally you should have a seperate backend sets for Master nodes and seperate backend set for slave nodes
If you are using a Public IP to connect to your Redis cluster make sure your enable authentication on all the nodes. Refer here for Redis security best practices [4]
Step 5. Cleanup
To remove Redis from all nodes you can run this shell script using the same hosts.txt file
#!/bin/bash
# Read the hosts from the file
while IFS= read -r host; do
# Extract the IP address and hostname from the line
ip=$(echo "$host" | awk '{print $1}')
hostname=$(echo "$host" | awk '{print $2}')
echo "Executing commands on $hostname ($ip)"
# SSH into the remote host and execute the commands
ssh "$ip" <<EOF
echo "Disabling Redis service..."
sudo systemctl stop redis.service
sudo systemctl disable redis.service
echo "Removing Redis module..."
sudo dnf module remove redis:remi-6.2 -y
sudo rm -rf /etc/redis.conf
echo "Commands executed successfully on $hostname ($ip)"
EOF
echo "--------------------------------------------------"
done < hosts.txt
By deploying a multi-AD Redis cluster in different AD you can ensure high availability and fault tolerance for your Redis database. If an entire AD goes down, the cluster can continue operating with the remaining nodes and fronted by the NLB. The automatic failover mechanism will ensure that the cluster remains operational and minimizes your application downtime.
References :
[1] Redis on Wikipedia : https://en.wikipedia.org/wiki/Redis
[2] Redis Docs : https://redis.io/docs/
[3] Redis Clustering Blog : https://severalnines.com/blog/installing-redis-cluster-cluster-mode-enabled-auto-failover/
[4] Redis Security Best Practices : https://redis.io/docs/management/security/