Creating Distributed Task Scheduler using Hazelcast

Anil Kurmi
Microservices Architecture
4 min readApr 10, 2023

A distributed task scheduler is a system that coordinates the scheduling and execution of tasks across multiple nodes in a distributed system.

Distributed task schedulers are commonly used in large-scale distributed systems, such as data centers and cloud computing environments, where tasks may need to be executed on a large number of machines simultaneously. They are typically designed to be fault-tolerant, so that if a node fails, tasks can be automatically rerouted to other nodes in the system.

Examples of distributed task schedulers include Apache Mesos, Kubernetes, and Apache Hadoop YARN. These systems provide a centralized framework for managing resources and scheduling tasks, and allow developers to focus on writing code instead of worrying about the underlying infrastructure.

For Example, running 10,000 compliance checks/policies on 500,000 machine in data center for scanning.

How to run 5–10 Billion task daily?

In this article we are creating distributed scheduler using spring boot and hazelcast clustering.

To create a distributed task scheduler using Spring Boot and Hazelcast, we can follow these general steps:

  1. Set up a Spring Boot project and add the necessary dependencies. You will need to add the Hazelcast dependency to your project, as well as any other dependencies you might need for your specific use case.
  2. Configure Hazelcast in your Spring Boot application. You will need to create a Hazelcast configuration bean and configure it to enable the distributed task scheduling feature.
 <dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-spring</artifactId>
<version>5.0.1</version>
</dependency>
@Bean
public Config hazelcastConfig() {
Config config = new Config();
config.getNetworkConfig().getJoin()
.getMulticastConfig().setEnabled(false);
config.getNetworkConfig().getJoin()
.getTcpIpConfig().setEnabled(true)
.setMembers(Arrays.asList("127.0.0.1"));
config.addExecutorConfig(new ExecutorConfig("distributed-scheduler")
.setPoolSize(10)
.setQueueCapacity(1000));
return config;
}

3. Create Scheduler Service, here we schedule a job which will execute task on leader node and distribute the task to all members.

@Service
public class DistributedScheduler {
@Autowired
private HazelcastInstance instance;

@Scheduled(fixedDelay = 10000, initialDelay = 60000)
void startJob() {
String leaderAddress = getOldestMember().getSocketAddress().toString();
String currentAddress = instance.getCluster().getLocalMember().getSocketAddress().toString();

//Run the task only on leader node

if (currentAddress.equals(leaderAddress)) {
System.out.println("I am leader, use me to poll database, distribute task etc");
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("distributed-scheduler");
scheduler.schedule(new MyScheduledTask(), 10, TimeUnit.SECONDS);
}
}

private Member getOldestMember() {
Cluster cluster = instance.getCluster();
Member oldestMember = null;
for (Member member : cluster.getMembers()) {
if (oldestMember == null || member.getUuid().compareTo(oldestMember.getUuid()) < 0) {
oldestMember = member;
}
}
return oldestMember;
}

static class MySchedulerJob implements Serializable, Runnable {
private static final long serialVersionUID = 1L;

@Override
public void run() {
System.out.println("My Task");
}
}
}

The node that will execute the Task depends on the configuration of the Hazelcast cluster and the availability of resources at the time the task is scheduled.

In a Hazelcast cluster, all nodes are equal and any node can execute a task. When a task is scheduled using the IScheduledExecutorService, it is added to a queue and will be executed by the next available node in the cluster that has the resources to run the task.

Hazelcast automatically manages the distribution of tasks across the cluster, so you do not need to worry about manually assigning tasks to specific nodes.

Hazelcast is designed to be fault-tolerant and provide high availability for data and services in a distributed environment. Hazelcast uses a number of techniques to ensure fault tolerance, including:

  1. Data Replication: Hazelcast replicates data across multiple nodes in the cluster, so if a node fails, the data is still available on other nodes. Replication ensures that the data remains available even if some nodes go down.
  2. Node Clustering: Hazelcast allows multiple nodes to join together to form a cluster. If a node fails, other nodes in the cluster can take over its responsibilities and ensure that the cluster remains operational.
  3. Automatic Failover: Hazelcast automatically detects node failures and redirects requests to other available nodes in the cluster.
  4. Quorum Rule: Hazelcast allows you to define a quorum rule to ensure that a certain number of nodes are available before operations can be executed. This helps to prevent data loss or inconsistencies in case of network partitioning or other issues.

In summary, Hazelcast is designed to be fault-tolerant and provide high availability for data and services in a distributed environment. Hazelcast uses a combination of data replication, data partitioning, node clustering, automatic failover, and quorum rules to ensure that the cluster remains operational even in case of node failures or other issues.

References

--

--