Redis Sentinel — High Availability: Everything you need to know from DEV to PROD: Complete Guide

Amila Iddamalgoda
Apr 21, 2018 · 13 min read

What does the term ‘Redis’ actually mean?

It means REmote DIctionary Server.

Alright! There are plenty of different Redis articles out there, but I wanted to share my experience as a developer with Redis by creating an “all in one proper article” covering the most essential and important stuff that is needed and helpful for a developer or a devops engineer to build a Highly Available Redis cluster with Sentinel.

So let’s get started…

Redis, which is an open source in memory data structure store, is a very popular selection among developers used for caching purposes, as a message broker and also mainly used as a NoSQL Key-Value database for different use cases.

In this post, I’m going to specifically discuss and demo about Redis along with Master/Slave Replication, High availability (Redis Sentinel), Automatic Failover , some production level optimizing tips and monitoring aspects. In addition, along with these topics I’ll be mentioning the issues and the errors that I faced during implementation of Redis Sentinel with Ubuntu. Following shows the OS version and Redis version details.

OS version: Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-119-generic x86_64)
Redis Version: 4.0.9

Additionally, I want to highlight that the Redis documentation is very informative and it’s ‘the go to place’ if you need any further clarification on Redis.

Moving forward to Redis basics; Redis is a in-memory database, simply which means Redis runs on RAM. You also need to know that Redis supports several data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps. Furthermore, it also supports atomic operations such as, appending to a string, incrementing the value in a hash, pushing an element to a list and etc.

Well, Let’s get things started with Redis High Availability.

How Redis offers High Availability and Automatic Failover ?

Redis sentinel is the high availability solution offered by Redis. In case of a failure in your Redis cluster, Sentinel will automatically detects the point of failure and bring the cluster back to stable mode without any human intervention.

What really happens inside Redis Sentinel ?

Sentinel always checks the MASTER and SLAVE instances in the Redis cluster, checking whether they working as expected. If sentinel detects a failure in the MASTER node in a given cluster, Sentinel will start a failover process. As a result, Sentinel will pick a SLAVE instance and promote it to MASTER. Ultimately, the other remaining SLAVE instances will be automatically reconfigured to use the new MASTER instance.

Sentinel acts as a configuration provider or a source of authority for clients service discovery.

What does that means ? Simply, application clients connect to the Sentinels and Sentinels provide the latest Redis MASTER address to them.

Furthermore, Sentinel is a robust distributed system, where multiple sentinels need to agree to about the fact a given master is no longer available. Then only the failover process starts a select a new MASTER node. This sentinel agreement is done according to the quorum value.

What is quorum ?

The quorum value is the number of Sentinels that need to agree about the fact the master is not reachable. However the quorum is only used to detect the failure. In order to actually perform a failover, one of the Sentinels need to be elected leader for the failover and be authorized to proceed. This only happens with the vote of the majority of the Sentinel processes.

Let’s get our hands dirty with Redis Sentinel.

We’ll stick to the basic setup with 3 server instances.

Please refer the above diagram which illustrates the 3 server instance basic setup. First of all make sure your Ubuntu instances are up to date with relevant build dependencies. Sometimes, you might be needing jemalloc as well. Following shows the steps to install Redis on your server instances.

sudo apt-get update 
sudo apt-get install build-essential tcl

sudo apt-get install libjemalloc-dev (Optional)
curl -O http://download.redis.io/redis-stable.tar.gz
tar xzvf redis-stable.tar.gz
cd redis-stable
make
make test
sudo make install

Now in the redis directory you should be able to see both redis.conf and sentinel.conf configuration files.

Before we run Redis let’s do some necessary basic configurations to up and run a Redis cluster. Following are the IP addresses of this setup.

10.52.209.46 (Initial Master Node)
10.52.209.47 (Initial Slave Node)
10.52.209.49 (Initial Slave Node)

Default port for Redis server is 6379 and Sentinel is 26379. Hence make sure you open up these port using,

sudo ufw allow 6379
sudo ufw allow 26379

The Redis configurations (both in redis.conf and sentinel.conf) in the above servers should be configured as follows.

For the basic setup above configurations will be enough but for production level please consider the tips mentioned in the latter part of this post. The only difference in redis.conf files in 3 servers is that, all the slaves must have the following config. 10.52.209.46 is the Master IP address.

slaveof 10.52.209.46 6379

slaveof tells Redis cluster to make this particular server instance as a SLAVE instance of the given MASTER node (10.52.209.46).

In sentinel.conf, following config notify Sentinels to start the cluster monitoring with following initial settings. Afterwards this config setting may automatically updated accordingly.

sentinel monitor mymaster 10.52.209.46 6379 2
(This tells sentinel to monitor the master node. And the last argument which is 2 is the quorum value)

Further, sentinel.conf includes following config values as well.

sentinel down-after-milliseconds mymaster 5000
(Means server will unresponsive for 5 seconds before being classified as +down and consequently activating a +vote to elect a new master node.)
sentinel failover-timeout mymaster 10000
(Specifies the failover timeout in milliseconds.)

Okie Dokie! Now… Let’s run Redis.

There are many ways to run Redis. In this demo I’ll stick to following command.

(Go to src folder.)
cd src/ && redis-server ../redis.conf &
cd src/ && redis-server ../sentinel.conf — sentinel &
(Or else you can simply use cd utils && ./install_server.sh.)

After that, you can simply check the Redis processes via ps -ef | grep redis command. Each server instance should running both a Redis process and a Sentinel process. If all goes to the plan, there should be 2 processes running as follows.

Now connect to Redis client via one of the following command and test whether Redis is working fine.

redis-cli ping
or
redis-cli -h IP_Address ping
You should get a output of PONG.

Awesome! Now you have Redis up and running. Let’s focus on the Master/Slave replication.

Master/Slave Replication

Now if you check the redis.log (which is located in the place we defined in the redis.conf) of each instances, you can get to see the Master — Slave synchronization occured.

Master node — redis.log

Slave node — redis.log

Checking Replication Status

You can check the replication information via info replication command in Redis CLI as well. Under the role attribute it mentions whether that particular node is a MASTER or a SLAVE (yellow box).In addition, in the Master node, it displays the details of all the connected slaves. (green box)

Now let’s examine what sentinel.log indicate. (which is located in the place we defined in the sentinel.conf)

Furthermore, if you check the sentinel.conf file, you can get to see that conf file is automatically updated with the latest configs, including sentinel known-slave and sentinel known-sentinel values.

Cool! Now let’s create a sample value in all nodes.

127.0.0.1:6739> set demokey “Amila”

As you can see in the above diagram, SLAVES are READ ONLY hence you can only write data to Master. Since Redis asynchronously replicates with all the remaining slaves, you can retrieve the inserted value from any Redis slaves using the same given key. In addition via,KEYS* you can list all the keys inserted. Above diagram clearly describes what we just talked about.

Now let’s check how Redis Sentinel Automatic Failover works.

Redis Sentinel Automatic Failover

Okiee! Let’s simulate an automatic failover scenario. In order to simulate a failover scenario you can simply stop the Redis server or kill the Redis process in the MASTER instance. Even you can sleep the Redis process as well. You can choose whatever the way that you desire.

kill -9 <process id>
or
redis-cli -p 6379 DEBUG sleep 30
or
redis-cli -p 6379 DEBUG SEGFAULT

As illustrated in the above diagram, in a failover scenario, if MASTER node fails then the 2 remaining Sentinels will determine the failover and if both agrees (Since quorum value is 2 ), then a new MASTER will be elected from those 2 remaining nodes.

Following shows the log tail for this failover scenario.

redis.log of Slave nodes.

sentinel.log of Slave nodes

Now let’s check for replication status via info replication command.

Further elaborating the log tail,

Each Sentinel detects the master is down with an +sdown event. (+sdown means the specified instance is now in Subjectively Down state.)

+new-epoch means the current epoch was updated.

+sdown event is later escalated to +odown, which means that multiple Sentinels agree about the fact the master is not reachable. (+odown means that the specified instance is now in Objectively Down state.)

Sentinels +vote a Sentinel that will start the first failover attempt.

The failover happens.

Further, following shows upstart jobs.

Upstart for Redis

description "Redis Server"start on runlevel [2345]
stop on runlevel [!2345]
script
echo $$ > /var/run/redis.pid
su - amila -c "cd /home/amila/redis-stable/src/; redis-server ../redis.conf"
end script
post-stop script
rm -f /var/run/redis.pid
end script

Upstart for Sentinel

description "Redis Sentinel Server"start on runlevel [2345]
stop on runlevel [!2345]
script
echo $$ > /var/run/redis.pid
su - amila -c "cd /home/amila/redis-stable/src/; redis-server ../sentinel.conf --sentinel"
end script
post-stop script
rm -f /var/run/redis.pid
end script

Congratulations! That’s simply it. That’s how Redis handles High Availability and Automatic Failover. Now let’s have a look on some interesting Redis facts before we jump in to optimizing tips and tricks.

Interesting facts about Redis.

Image Reference: http://download.redis.io/logocontest/

Redis can handle up to 2 ³² keys, and was tested in practice to handle at least 250 million keys per instance.

An empty instance uses ~ 3MB of memory.

1 Million small Keys -> String Value pairs use ~ 85MB of memory.

1 Million Keys -> Hash value, representing an object with 5 fields, use ~ 160 MB of memory.

Redis is single threaded. How can I exploit multiple CPU / cores?

It’s not very frequent that CPU becomes your bottleneck with Redis, as usually Redis is either memory or network bound. If it is the case, horizontal or vertical scaling of Redis instances will help to reduce CPU related bottlenecks.

Redis is an in-memory but persistent on disk database.

Redis Persistence ->RDB : point-in-time snapshots of your dataset at specified intervals. (Data backup)| AOF : logs every write operation received by the server. (More Durable)

If you are using Java, you can use Jedis which is a java client, to connect your Java application with Redis. Note: Jedis uses Apache Commons Pool for connection pooling (GenericObjectPool). A single Jedis instance is not threadsafe! To avoid these problems, you should use JedisPool, which is a threadsafe pool of network connections. Default max connection pool size is 8.

Now let’s focus on the issues/errors that you might get and some production performance optimizing tips and tricks for Redis.

Issues/Error that you might get & Production performance optimizing tips and tricks

First of all, in the redis.conf and sentinel.conf, all the configs are in an order so don’t mess up with the order. Otherwise you will get config errors like follows.

*** FATAL CONFIG FILE ERROR ***
Reading the configuration file, at line 98
>>> 'sentinel down-after-milliseconds mymaster 5000'
No such master with specified name.

From security perspective, set up a password to authenticate with the master and slaves. For this you can easily change the redis.conf and sentinel.conf accordingly with following config value.

In sentinel.conf
sentinel auth-pass <master-name> <password>
In redis.conf
masterauth <master-password>

Makesute that the TCP backlog setting (Max connections) is 511. You can set that value accordingly (Considering your server specification) with following steps.

sudo nano /etc/rc.local
Add this:
sysctl -w net.core.somaxconn=65535
When you reboot the next time, the new setting will be to allow 65535 connections instead of 128 as before. When you add the line to rc.local make sure it's before the exit 0.

Some of the warnings that you might come across from the redis.log could be,

WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1’ for this to take effect.You can add this via following command as well as using a cat command you can makesure whether the value is set properly.echo 'vm.overcommit_memory = 1' >> /etc/sysctl.conf
cat /etc/sysctl.conf

You might need to disable THP or Transparent Huge Page.

WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.Fixsudo nano /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Using UNIX socket instead of TCP port 6379 will also contribute towards Redis performance gain.

Reference: https://redis.io/topics/benchmarks

In order to achieve this, redis.conf should be changed to following setting.

unixsocket /var/run/redis.sock
unixsocketperm 777
# Accept connections on the specified port, default is 6379.
# If port 0 is specified Redis will not listen on a TCP socket.
port 0

Additionally, depending on your use case you can configure the Redis persistence option as well.

If you wish, you can disable persistence at all, if you want your data to just exist as long as the server is running. Note: you can disable RDB persistence by commenting out all “save” lines in redis.conf as follows.

Comment out these 3 values in redis.conf
save 900 1
save 300 10
save 60 10000
rdbcompression no
rdbchecksum no

AOF persistance ( Append Only File) is disabled by default in redis.conf.

appendonly no

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Monitoring Redis

From Monitoring perspective, there are several tools to monitor Redis. NewRelic provides advance capabilities to monitor and analyze Redis including “Most time consumed db operations” , “operations by throughput” , “operations by query time” and etc.

More information about New Relic Redis is mentioned here. Additionally, Redis-stat is also a good opensource Redis monitoring tool.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

AWS ElasticCache

AWS also offers an in-memory cache cloud service named “ElastiCache”, which also comes with Redis included. It can be specified as an effective easy to use cloud service, which offloads most of the manual configurations and administrative tasks.

ElastiCache is a web service that makes it easier to launch, manage, and scale a distributed in-memory cache in the cloud.

Interestingly, it ships with fabulous advanced features such as Cluster Mode with sharding within few clicks, Multi-AZ with Auto-Failover, Encryption at-rest, Encryption in-transit, Import RDB file to S3, Enable automatic backups and many more.

Coolest feature of AWS ElastichCache is that, it offers a comprehensive monitoring dashboard for your Redis cluster including monitoring aspects such as CPU/Memory utilization (Swap and Freeable memory), Connection count, Item count, Evictions, String;Key;Hash based Command count, Replication lag and many more.

Well… That’s pretty much it for this post!

As I mentioned in the beginning of this post, there are plenty of different Redis articles out there, but I wanted to create a “all in one proper article” covering the most essential facts and tips that is needed and helpful for a developer or a devops engineer to build a Highly Available Redis cluster with Sentinel.

Before finishing up this post, one of the interesting articles that I found was How Flickr implemented Redis Sentinel. Please make sure to check that post as well.

OK. So thank you so much for reading this article and I look forward to come up with another interesting article soon, sharing my experience as a developer. Till then, Cheers! and Happy Coding!

Amila Iddamalgoda

Written by

• Full-Stack Engineer • AWS Certified Solutions Architect — Associate • Technology Enthusiast https://www.linkedin.com/in/amila-iddamalgoda-81055a61/

More From Medium

Also tagged Redis

Also tagged High Availability

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade