An Easy Integration of Distributed Lock

Published in

Sahibinden Technology

5 min readApr 7, 2022

Recently, we have activated a microservice for sahibinden.com, which can moderate classifieds with AI. This microservice is one of the first microservices created by the sahibinden.com microservice migration, and therefore, some problems to be resolved were encountered for the first time.

In this article, I will address the problem caused by running a scheduler job on different instances at the same time and how we solved it with distributed lock mechanism.

Some Theory

The Problem

If scheduled tasks that are not ready to be executed in parallel, we don’t want them to run at the same time. Because this can affect your system in two ways.

Efficiency: Increases the cost because it does the same work more than once.
Correctness: Working on the same resource while doing the same work more than once.

The problem we faced was efficiency. Our scheduler job are not designed to run in parallel. This job does not need to run in parallel, but it is possible to run in parallel because it runs in different nodes.

Therefore, our job running in more than one instance could query the same ids multiple times. These query results were also being updated in the datasource. The same id’s queried at the same time actually return the same result. For this reason, this job’s working at the same time and querying the same id’s caused an efficiency problem. In addition, these additional inquiries could cause us to pay extra in the future.

The Solution

One of the ways to solve this problem is the distributed lock mechanism. This mechanism prevents the running of jobs at the same time by having them checked from a datasource, accessible from all instances. Here is Martin Kleppmann’s explanation about this mechanism:

The purpose of a lock is to ensure that among several nodes that might try to do the same piece of work, only one actually does it (at least only one at a time)

An instance that starts the process first checks for lock to start working from a datasource that all other instances can access. If there is no lock, it creates a lock record and starts working. If lock exists, it terminates the process and waits for the next runtime. Thus, this cycle continues and only one instance’s job is guaranteed to run at a time.

Some Research

So how did we apply this mechanism to the scheduler jobs of our multiple instance running service? As it can be understood from the explanation of the mechanism, although it has a simple implementation, existing implementations have started to be investigated. Since the database our service uses is MongoDB, we needed a lock implementation with MongoDB support.

Considering the simplicity of its applicability and the wide datasource support it provides, ShedLock was seen as the most suitable candidate and we decided to move forward with it.

GitHub - lukas-krecan/ShedLock: Distributed lock for your scheduled tasks

ShedLock makes sure that your scheduled tasks are executed at most once at the same time. If a task is being executed…

github

ShedLock Note: The locks are time-based and ShedLock assumes that clocks on the nodes are synchronized.

Some Practice

After adding the library to your code with the redirects from Shedlock’s Github address, all you have to do is call the lock and unlock operations with ShedLock’s LockProvider service.

Actually, ShedLock has an annotation for auto lock/unlock (@SchedulerLock). You can manage your locks with the help of this annotation. But the important thing in our case was that only the jobs in the primary datacenter and active instances used the lock mechanism. As seen below, we wanted the lock to run after our job had run and made certain checks in the “if”. For this reason, annotation could not be used and the implementation we wrote below was used.

In order to use the implementation, we first give our MongoClient object to the MongoLockProvider class for the Mongo ShedLock integration as follows and we generate the LockProvider bean. So ShedLock now creates its own collection to hold locks in our MongoDB.

In the code snippet, we obtained mongoClient from MongoReplicaSetFactory located in sahibinden.com mongo infrastructure. You will need to code here according to your infrastructure. All you need to do is get the mongoClient and provide it to the MongoLockProvider as in the snippet.

After adding the ShedLock LockProvider configuration, we implemented a service as follows to make LockProvider more manageable in our services.

The service includes a method for locking.

We are making a lock request using this method of the service. The method takes three parameters. The first parameter is a key we will use for lock id. With this parameter, we can create separate lock records for different jobs or methods. The second and third parameters are the lock window range. For example, if we give the numbers 2000(ms) and 10000(ms) to the 2nd and 3rd parameters, respectively, the lock will be active for a minimum of 2 seconds and a maximum of 10 seconds.

The part of releasing a lock we get is handled automatically in the implementation below, with the AutoCloseable behavior. If the lock object returned with this interface is used as try-with-resources, the lock will be automatically released when the try part is finished.

I would like to mention a point about implementation that you should pay attention to. In our case, as I mentioned above, there was a possibility of inefficiency. But it did not create a problem in data consistency because we would have updated the data with the same id more than once. However, if there is a data consistency condition, you may want to take advantage of the following resource in the case you will use. Because if the working range of your code is more than the lock’s expire time (lockAtMostFor), there is a possibility that different jobs will work at the same time in this implementation.
Making the lock safe with fencing https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html

Usage Example

In order to use this service, before our code runs, we request a lock with the lock method as follows. We make the method call using try-with-resources structure. Thus, after the code of our job is finished, we ensure that it is released, thanks to the AutoCloseable feature.

With this integration and library, we can avoid the problems caused by our jobs that are not designed for parallel run.

Thank you for reading.

References

[1] Martin Kleppmann: How to do distributed locking, Feb 08, 2016

[2] Lukáš Křečan: “ShedLock”, github.com, 2020–2022

[3] “Shedlock and Micronaut, a super easy distributed lock solution”, blog.javapapo.com, Nov 20, 2020

[4] “Java — Try with Resources”, baeldung.com, March 10, 2022