How Redis suits Centrifugo

In this post I want to share some details about Redis as one of the Centrifugo built-in engines.

This is what “redis” means in Russia:)

Maybe you know from project docs or from my first article here Four years in Centrifuge that Centrifugo has two built-in engines: in memory (suitable for single node deploy) and Redis Engine (for multi-node Centrifugo deploy). Besides multi-node support due to its PUB/SUB feature Redis gives us an opportunity to keep channel data — such as message history cache, client presence information — and allows all nodes to have access to this data. But lets start from the beginning and describe everything in more detail.

What’s engine?

Engine in Centrifugo must be able to do several things:

  • It’s responsible for subscribing/unsubscribing node to/from channels
  • It’s responsible for publishing messages into channels
  • It’s responsible for maintaining message history cache for channels (and maintain its expiration)
  • It’s responsible for maintaining channel presence information (and maintain its expiration)

Too many responsibilities, yeah?:)

There are not so many server software in the world that could provide all of the features above out of the box. I can only remember RethinkDB (though it has no builtin expiration) and Tarantool (it has no proper PUB/SUB yet). Redis is perfect candidate here as it has everything Centrifugo needs to work. Let’s see in details how Centrifugo utilizes Redis features.

Publish/Subscribe

Centrifugo is a PUB/SUB server — this means that clients subscribe on channels (topics) and wait for messages published into those channels (by another client or application backend). There are many similar real-time solutions in the wild. Redis allows Centrifugo to be more scalable as it provides a way to run several Centrifugo nodes for load-balancing clients. Those nodes connected together using Redis as PUB/SUB broker.

Every Centrifugo node subscribes on some channels in Redis:

  • centrifugo.control channel — channel through which all internal communnication between Centrifugo nodes happens — ping messages, node statistics, propagating some API commands such as disconnect and unsubscribe
  • centrifugo.admin channel — channel that allows admin websocket connections to live on any node as messages will be delivered to any of them using this PUB/SUB channel
  • and the most important — Centrifugo subscribes on centrifugo.message.CHANNEL channels in Redis as soon as some websocket or SockJS client subscribes on channel CHANNEL.

Now let’s look at what happens. Let’s look again at simple scheme I already showed you in previous post here.

You can see 4 Centrifugo nodes, all connected via Redis:

Now if client connects to any Centrifugo node and another client connects to any other Centrifugo node and both clients subscribe on the same CHANNEL — it’s possible for application to just PUBLISH message into one Centrifugo node. Node will publish that message into Redis then so every node that has interested client (subscribed on CHANNEL) will receive message and will forward it to client connection.

Using PUB/SUB via internal communication centrifugo.control channel allows each node to have information about other running nodes — there is no need to create full mesh graph off connected nodes — with Redis as PUB/SUB proxy everything is very simple. To add new Centrifugo node into cluster all you need is to provide Redis server address in configuration.

Publish new messages into channels

As I mentioned above, to deliver message to clients message must be PUBLISHed into Redis. Centrifugo has HTTP API to receive new messages from application — after preparing new message it then published into Redis.

Interesting thing here is how to publish efficiently. As Redis is single threaded and supports pipelining — the most efficient way is publish new messages to Redis over single connection and use batching mechanism to pipeline as many messages into batch as collected over RTT interval since previous pipeline request sent. It could be not very opaque from my words — so look at Smart Batching article describing this simple technique.

Btw for SUBSCRIBE/UNSUBSCRIBE commands described above Centrifugo utilizes pipelining too.

Combining pipelining and batching allowed to increase publish throughput in more than 20 times. Many thanks to Paul Banks who contributed this improvement into Centrifugo.

Message history cache

Let’s go further and talk about message history. Centrifugo provides message history of limited size and lifetime for channels. History for channels is kept in Redis LIST data structure. Every time message added into history list LTRIM command called to keep list of fixed maximum size. Also an EXPIRE command called every time too. So there is no infinite history grows and no memory leak from old unused channels.

The big win is that Redis provides both PUB/SUB and store capatibilities is that message publish and saving it into channel history list can be done atomically. So actually when publishing message Centrifugo calls Lua script which combines both operations into one atomic step. Here is this script:

local n = redis.call("publish", ARGV[1], ARGV[2])
local m = redis.call("lpush", KEYS[1], ARGV[2])
if m > 0 then
redis.call("ltrim", KEYS[1], 0, ARGV[3])
redis.call("expire", KEYS[1], ARGV[4])
end
return n

KEYS contain Redis key names we do operation under and ARGV array has different script arguments such as channel name to publish, data to publish and some Centrifugo specific channel options which determine how to deal with message.

Again — combination of PUB/SUB, data store and atomic flexible lua procedures makes Redis unique and very suitable for Centrifugo needs.

Presence information

Another important Centrifugo feature is presence information — sometimes it’s necessary for application to get info about clients currently connected and subscribed on certain channels. Presence information must be expiring.

Presence information in Centrifugo implemented using combination of 2 built-in Redis data structures — SET and HASH. Below is a lua script that updates presence information for client in channel:

redis.call("zadd", KEYS[1], ARGV[2], ARGV[3])
redis.call("hset", KEYS[2], ARGV[3], ARGV[4])
redis.call("expire", KEYS[1], ARGV[1])
redis.call("expire", KEYS[2], ARGV[1])

KEYS just contain key for SET structure and key for HASH structure built using channel name.

ARGV[1] — expiration seconds

ARGV[2] — expire at time as Unix seconds for SET member

ARGV[3] — unique connection ID in Centrifugo

ARGV[4] — encoded connection information

To get presence information we use the following lua code:

local expired = redis.call("zrangebyscore", KEYS[1], "0", ARGV[1])
if #expired > 0 then
for num = 1, #expired do
redis.call("hdel", KEYS[2], expired[num])
end
redis.call("zremrangebyscore", KEYS[1], "0", ARGV[1])
end
return redis.call("hgetall", KEYS[2])

Again, KEYS are just keys.

ARGV[1] — current Unix seconds.

I don’t know any other existing database solution that could allow us to implement presence feature in such short and elegant manner. And again — we do it atomically here as Redis lua scripts block Redis thread while executing until completed.

High availability

Redis as central broker could be a single point of failure.

Centrifugo supports official way to add high availability for Redis — Redis Sentinel. So if Redis master instance crashed Sentinels will elect new master instance very soon and Centrifugo will switch on using new elected Redis master instance. There would be only a small downtime interval while Sentinels elect new master.

Horizontal scalability

We all know that Redis is insanely fast. Single Redis instance is capable to process more than 100 000 PUB/SUB messages per second. But its speed and throughput are not unlimited so in large Centrifugo setup Redis could become a bottleneck.

Redis cluster allows to automatically shard stored data but not a good solution for scaling PUB/SUB.

I don’t know stories from Centrifugo users about Redis being bottleneck yet but constantly thinking about horizontal scalability problem as the solution to this problem can promote Centrifugo to the next level.

One approach that could be used already — create several separate Centrifugo+Redis clusters and shard user connections between clusters somehow. This connection sharding logic depends on your application — there is no universal recept unfortunately. And it’s pretty difficult to be fair.

But what I want to do next is consistent hashing on Centrifugo side. As all methods in engine operate with channel — why not to share PUB/SUB and history/presence data among several Redis instances. Hopefully I’ll merge support for this soon as pull request with changes already open.

Conclusion

The story does not end here — it was just some notes about Redis usage in Centrifugo. You could find more details in source code.

At work we have been using Redis for several years — as Centrifugo engine, as cache, as broker for Celery. Our development department has some stories of different software bugs and failures. But none of them related to Redis. It’s an incredible server — stable, feature-full and (most important) very simple in all aspects. Many thanks to Salvatore Sanfilippo and all Redis contributors.