How to Replace Quartz with Shedlock

Published in

TAV Technologies

8 min readNov 16, 2023

In the modern microservice architecture, there is a growing need for a reliable and scalable job scheduling solution. Quartz, an open-source job scheduling framework, has emerged as the de facto standard for this purpose. Quartz offers a robust set of features that make it well-suited for microservice environments, including: High availability, Scalability, Flexibility and Ease of use with plenty of documentation and use cases.

Why is Quartz so important in microservice environments?

Microservice architectures are often composed of many independent services, each of which may have its own scheduling needs. Quartz provides a unified way to manage scheduling across all of these services.
Microservice environments are often dynamic and can change frequently. Quartz is a resilient scheduling solution that can handle these changes without disrupting the flow of jobs.
Microservice environments often require high availability and scalability. Quartz is a highly available and scalable scheduling solution that can meet these requirements.

Why is Quartz the de facto standard for job scheduling in microservice environments?

It is a mature and well-tested framework. Quartz has been around for many years and is used by a wide range of organizations.
It is a highly performant and scalable framework. Quartz can handle a large number of scheduled jobs with low latency.
It is an open-source framework. Quartz is freely available and can be customized to meet the specific needs of any organization.

Why would we be replace something like this of mature solution with something else?

First and foremost, it’s important to acknowledge that no solution is perfect. While Quartz excels in handling sophisticated job definitions, our primary usage mainly focuses around repetitive tasks, particularly for our outbox patterns that send messages to Kafka by fetching them from the event store or for hourly cleaning of messages successfully delivered to Kafka.

However, the large number of microservices we use created a substantial challenge. With over 60 microservices, each with its own database and Quartz logic, the clustering mechanism implemented by Quartz, which relies on locking rows, resulted in long-lasting locks that adversely impacted database performance.

During routine maintenance, the DB team discovered an issue because of frequent inserts and deletions to the FIRED_TRIGGERS table. Despite the table’s emptiness, its size exceeded 40GB due to fragmentation, caused by the practice of each fired job inserting itself into the table and removing the entry upon completion.

Furthermore, some services do not require a database for operation, but we were forced to use one due to the need for each instance to run its scheduled jobs exclusively. To implement Quartz and clustering, we had to connect it to a database solely for Quartz operation.

Additionally, we encountered instances where jobs failed to execute altogether due to discrepancies in server time across a few Kubernetes workers. This synchronization issue disrupted the clustering operation.

Our reliance on Quartz and the various issues we encountered prompted us to seek a lightweight alternative. Our requirements became clear: instead of a distributed scheduler, we needed a distributed lock mechanism that would prevent parallel job execution among multiple instances of the same service and something do not solely depends on RDBMS to be able to work.

At this point, we could have considered implementing a custom locking solution for relational databases (RDBMS) or a distributed lock using Redis for non RDBMS services. However, this approach would have incurred significant costs in terms of implementation, deployment, and testing. Therefore, we explored existing production-ready solutions and discovered ShedLock.

Shedlock? Who?

ShedLock makes sure that your scheduled tasks are executed at most once at the same time. If a task is being executed on one node, it acquires a lock which prevents execution of the same task from another node (or thread).
ShedLock is designed to be used in situations where you have scheduled tasks that are not ready to be executed in parallel, but can be safely executed repeatedly

ShedLock is a lightweight Java library that provides a simple and flexible mechanism for implementing distributed locks iniated by Lukas Krecan. It is designed to be easy to use and integrate with existing applications, and it supports a variety of lock providers, including relational databases, NoSQL databases, and distributed caching systems.

Simplicity: provides a simple and easy-to-use API for acquiring and releasing locks.
Flexibility: supports a variety of lock providers, including relational databases, NoSQL databases, and distributed caching systems.
Reliability: designed to be reliable and fault-tolerant. It can handle node failures and network outages without losing data.
Extensibility: extensible and can be customized to meet specific needs.

How ShedLock overcomes the hurdles we’ve encountered?

ShedLock’s lightweight design and ease of use make it a good choice for replacing a complex and heavyweight solution like Quartz.
ShedLock’s distributed lock mechanism prevents parallel job execution among multiple instances of the same service, which eliminates the need for Quartz’s database-based clustering mechanism.
ShedLock’s support for a variety of lock providers allows you to choose the best option for your environment.
Reduced database load: Rather than employing update locks, ShedLock determines whether a job is active or not by examining the SHEDLOCK table. Also, does not require a dedicated database table, which can reduce the load on your database.

Taking into account the entire discussion up until this point, it’s apparent that ShedLock effectively addresses the challenges we’ve outlined and meets our specific requirements. Its lightweight design, ease of use, flexibility, and reliability make it a compelling choice for replacing a complex solution like Quartz and implementing distributed locking mechanisms for our microservices architecture.

All hands on deck

Remember, when searching for ShedLock examples, you’ll likely encounter code that generates a LockProvider with a default LockAtMostFor definition, followed by the creation of a Scheduler class annotated with @Scheduled and @SchedulerLock with lock definitions. Our approach will be slightly more sophisticated, providing greater flexibility in scheduling your tasks. Let’s create our LockProvider class.

@Bean
public LockProvider lockProvider(DataSource dataSource) {
            return new JdbcTemplateLockProvider(
                JdbcTemplateLockProvider.Configuration.builder()
                .withJdbcTemplate(new JdbcTemplate(dataSource))
                .usingDbTime() // Works on Postgres, MySQL, MariaDb, MS SQL, Oracle, DB2, HSQL and H2
                .build()
            );
}

We certainly don’t want to abandon the programmatic approach we’ve used with Quartz by implementing Spring’s InitializingBean. Back in the days when we implemented our jobs programmatically with triggers and job definitions, we utilized a QuartzSchedulerInitializer. We’ll need something similar for ShedLock.

A quick solution would be to employ ThreadPoolTaskScheduler for this operation. We can create our runnable class and programmatically run our runnable classes with specified intervals. However, this approach lacks ShedLock’s job locking mechanism. Instead, you would end up with local schedulers across all your services, which is definitely not the desired outcome. Let’s delve into ShedLock’s documentation to discover how to lock our jobs.

Upon quick review, it appears we already have the answer. To utilize ShedLock’s job locking mechanism for your tasks, we’ll need to create a wrapper class around our actual runnable class. This wrapper class will be responsible for acquiring a ShedLock lock before executing the runnable task and releasing the lock upon completion. Let’s create that class.

Now that we have a wrapper class that can be locked with the desired configuration, let’s submit our jobs using ThreadPoolTaskScheduler.

For consistency, we’ll create a new class called ShedlockSchedulerInitializer to handle the responsibilities previously handled by QuartzSchedulerInitializer. This new class will be responsible for generating LockableJobs and registering them using TaskScheduler.

To create the actual jobs/runnables, we’ll use the RunnableTaskFactory class. This class is responsible for generating the actual runnables that we will later wrap with the BaseLockedRunnableJob class.

After completing all the operations, we simply call the ShedlockSchedulerInitializer’s generate method to register our wrapped jobs to the task schedulers as locked.

    /**
     * <p>generate job with given parameters</p>
     *
     * @param threadPoolTaskScheduler thread pool task scheduler instance to be used for scheduling
     * @param lockableJob             lockable job instance which contains runnable task and lock provider
     */
    private void generate(ThreadPoolTaskScheduler threadPoolTaskScheduler,
                          BaseLockedRunnableJob lockableJob) {
        threadPoolTaskScheduler.schedule(lockableJob, new CronTrigger(lockableJob.getCron()));
        log.info("{} is scheduled to run with cron {} and durations; lockAtLeastFor: {}, lockAtMostFor: {}",
                lockableJob.getLockName(), lockableJob.getCron(), lockableJob.getLockAtLeastFor(), lockableJob.getLockAtMostFor());
    }

That’s all we need to do to be able to replace Quartz with Shedlock.

Avoiding Parallel Execution

As discussed earlier, ShedLock is not a distributed scheduler and lacks the Update Lock mechanism, which prevents jobs from executing concurrently. Therefore, when using ShedLock, it is crucial to carefully consider the fine-grained parameters LockAtLeastFor and LockAtMostFor to ensure proper job scheduling and prevent conflicts.

lockAtLeastFor attribute which specifies minimum amount of time for which the lock should be kept. Its main purpose is to prevent execution from multiple nodes in case of really short tasks and clock difference between the nodes.
lockAtMostFor we make sure that the lock is released even if the node dies. Please note that lockAtMostFor is just a safety net in case that the node executing the task dies, so set it to a time that is significantly larger than maximum estimated execution time. If the task takes longer than lockAtMostFor, it may be executed again and the results will be unpredictable (more processes will hold the lock).

Choosing too generous LockAtMostFor value can lead to situations where jobs remain locked even if the executing process dies, requiring manual intervention to release the lock.

On the other hand, setting LockAtMostFor to a low value can result in parallel execution of the same job on multiple nodes if a slow operation occurs due to internal or external dependencies, such as slow third-party web service calls or database performance issues.

So how to overcome this? Let’s dive into Shedlock’s documentation again.

KeepAliveLockProvider

KeepAliveLockProvider extends the lock in the middle of the lockAtMostFor interval. For example, if the lockAtMostFor is 10 minutes the lock is extended every 5 minutes for 10 minutes until the lock is released. Please note that the minimal lockAtMostFor time supported by this provider is 30s. The scheduler is used only for the lock extension, single thread should be enough.

So yeah, as the explanation stands, unless the operating node dies, when you use the KeepAliveLockProvider, shedlock watches the job status and extends lockAtMostFor duration for a safety margin in order to prevent paralel job execution. Let’s convert our LockProvider to KeepAliveLockProvider.

A singleThreadScheduledExecutor would be enough since it only watches the existing job statuses.

That’s all we actually need to do for replacing Quartz with Shedlock. Do not forget to analyze your needs and act accordingly instead of blindly adopting it.

You can find the source code on my github repository.

Bonus: Spring Native

As a bonus, whole project is designed to run as native executable thanks to GraalVM and Spring Native. You can effortlessly build the native executable on your local machine or in a Docker environment.

Just a reminder that since the native builds require more resources, make sure that your Docker installation has at least 16GB of memory and 8 CPU dedicated for build process.

I’m genuinely impressed by its capabilities. It’s worth every seconds of building!

quartz-shedlock-migration-app-0-1     | 2023-11-16T12:32:11.881Z  INFO 1 --- [           main] c.k.QuartzShedlockMigrationApplication   : Started QuartzShedlockMigrationApplication in 0.222 seconds (process running for 0.225)
quartz-shedlock-migration-app-1-1     | 2023-11-16T12:32:12.187Z  INFO 1 --- [           main] c.k.QuartzShedlockMigrationApplication   : Started QuartzShedlockMigrationApplication in 0.157 seconds (process running for 0.159)

There will be more articles about spring native coming from me in the near future, so stay tuned.