Explain Redlock in Depth

Photo by Parsoa Khorsand on Unsplash

Previously, I introduced two types of locks, mutex locks and barriers, and I used Redis as an example to explain the differences between the two types of locks. Shortly after that, I received a reply saying that if you want to implement distributed locks, the Redis approach is not enough and you should refer to Redlock.

Well, his opinion is basically right.

Although Redlock is implemented through Redis, it can achieve a very high level of consistency and is one of the best paradigms for implementing distributed locking. However, Redlock has a very high cost behind it and is not suitable for all organizations and services.

Nevertheless, I will introduce Redlock and present my thoughts on why Redlock is impractical.

Redis is not reliable

Before getting started, I should emphasize that Redis persistence is unreliable, even with the most strict settings.

However, in a cluster environment, a Master may have a number of Slave redundancies. Is Redis still unreliable under these conditions? The answer is yes.

In Redis implementation, data replication is performed by the background process, not the master thread, so when a client writes successfully, it does not mean the data is replicated successfully. One of the procedures leading to the problem is as follows.

When Client 1 successfully locks via the original Master, but the Master dies before the data is replicated, then Client 2 can still successfully lock via the original Slave (the new Master).

This is the reason why Redis clusters are still unreliable.

Redlock Concept

As we have seen, a single Redis is not reliable, even in a cluster of multiple Redis. So how do we use Redis to implement a reliable Redlock?

The answer is through majority consensus. Since one Redis is not reliable, we form a committee of multiple Redis. If and only if more than half of the committee members agree, the lock will take effect; otherwise, the lock is invalid. The members can be single, master-slave, or even clusters, but nevertheless, they are independent of each other, in other words, they are not duplicates of each other, not to mention the same cluster.

According to the majority consensus algorithm, the committee should have an odd number of members and be approved when a majority of the members (N/2 + 1) agree. N indicates the total number of members.

The detailed process is written in the official Redis documentation on distributed locking, so I’ll briefly describe the process below.

Suppose our committee is composed of three Redis.

The following is the process of successfully locking up and doing the task and unlocking successfully.

  1. The client who wants to obtain the lock generates a globally unique ID, and the official document selects the system time to use.
  2. Try to use this ID to get the consent of all committee members. Use the command SETNX to do this.
  3. 2 members agree, then the lock is successfully in place.
  4. After getting the lock, the user can do what desired.
  5. The next step is to unlock on every member, whether or not the lock is successfully getting.

The process for unsuccessful locking is similar, as long as Redis1 or Redis2 also fails to respond, then the lock cannot be acquired, that is to say, you cannot do anything, but you still have to perform the process of unlocking all the members.

Redlock Issues

After describing the Redlock process above, I’d like to explain why I rarely consider such an approach.

Firstly, the entire Redlock implementation process, as mentioned in the previous section, is very time-consuming. Particularly if you want to lock for 3 seconds, but actually only 2 seconds or less are left after the locking process. Because the application must be initiated to all Redis first, even with parallel processing, the network delay and packet loss still make the communication chaotic and complicated.

Secondly, in order to make unreliable Redis reliable, many independent Redis must be launched. In the context of site reliability engineering, the maintenance effort for so many Redis is very high, and it is also a problem to make the participating clients aware of the existence of so many Redis. Such a approach is impractical in terms of cost, maintenance effort, and complexity of implementation.

Furthermore, the core of this approach is GUID. If IDs are duplicated, both locking and unlocking may result in false positives and unpredictable results. When this happens, the difficulty of detecting it is also significant. Nevertheless, the system time, as officially documented, is a very weak guarantee. In a distributed system, it is difficult to ensure that all instances have the same time, which is known as clock skew in system design.

To sum up, Redlock is an expensive approach with a lot of technical depth. Although many people have implemented packages in various programming languages based on official documents, does each user understand the potential risks behind the simple use of the packages?

Conclusion

The main reason I don’t use Redlock is because making unreliable Redis reliable is putting the cart before the horse. I always tell my team members, “Data in Redis needs to be aware that it will disappear without warning”. If you want to keep the data persistent, you should consider a more persistent database rather than a cache.

When implementing distributed locking, instead of using Redis, we should use a more reliable database, such as MySQL, which is strongly consistent, or MongoDB, which is my personal preference. But even if we use a database, we should pay attention to the implementation details of the database. Take MongoDB as an example, if we want to implement a lock, then we need to be aware of read-after-write consistency.

There are many aspects to consider behind the system design, and it is not enough to just make the function work. How to control the budget? How to allocate manpower? How to maintain day-to-day? How to troubleshoot? All of these factors involve the capacity of the team, which I believe is far more important than the actual functionality.

When considering the use of distributed locks, I first ask myself, “Do we really need locks? Is there a way to avoid possible race conditions through architecture design?” It is far more effective to avoid locking by improving the architecture than to seek synchronization in a distributed system. If we really have to use a distributed lock, and we need to keep the usage to a minimum, then we don’t need to use Redis but can use a relatively slow database to implement it.

Complexity is killing software developers

And Redlock is one of the most complicated approaches. In my opinion, it should be avoided.

--

--

--

一群技術人想要寫出一些好文章所建立的技術專欄。每週二一篇原創文章、一封電子報,歡迎大家訂閱!主網站: https://weekly.starbugs.dev/。

Recommended from Medium

5 Steps to a Beautiful Terminal That You’ll Love Using

The 5 Superpowers (and Struggles) of Great Startup Software Engineers

Mittens — Warming up Your Application

Mittens logo

If Nothing Else Works, Let’s be Agile!

The Forgotten Abilities — Testability Too

Houston, we have a problem

Node.js Frameworks

Data Mesh Implementation in A Multi-Cloud Architecture

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Chunting Wu

Chunting Wu

Architect at Lang Inc. Experienced in system design, backend development, and embedded systems. Sponsor me if you like: https://www.buymeacoffee.com/MfGjSk6

More from Medium

Schrödinger’s Microservice

Caching Strategy 101

How We Benefited From Webhooks in a Specific Use Case Scenario

Microservices : Spring Cloud API Gateway