Task Coordination in Distributed Deployment of Micro Integrator

Nirothipan Ram
Think Integration
Published in
5 min readApr 25, 2021
Distributed Consensus

Wondering the relation of the above picture and the topic ?, read more to figure it out!

Why Task Coordination ?

Often, it becomes necessary to deploy more than one instance of the Micro Integrator to handle two main production needs namely Scalability and Availability. This opens the door to the necessity of distributed consensus. That is when there are more than one server, acting as a single unit/entity it needs to coordinate the work among themselves so that it is not duplicated and all client sees the same value of data disregard of which server it is connecting to. Although, most of the mediation scenarios in Micro Integrator is stateless, tasks requires coordination among them. The tasks are the core functional unit of following 3 constructs.

  1. Message Processors
  2. Scheduled Tasks
  3. Polling Inbound Endpoints

In order to ensure that these tasks are not duplicated, the system needs to have record of the node which is running the task and its state. For instance, if there is a schedule every day at 12 p.m to transfer a file to an FTP server and if the system starts to execute it in more than single server, there will be duplication and inconsistencies.

How Task Coordination is Handled ?

There are plenty of algorithms which are still under debate to achieve distributed consensus in a distributed system. Some of the well know ones are RAFT and Paxos whereas RAFT is considered to be the simplified version of Paxos.

Micro Integrator uses its authentic RDBMS based algorithm to tackle this problem. This is based on a central database via which all of the nodes in the server communicates. There are two main advantages from this.

  1. The quorum is always 1, which means that we can have just 2 nodes to achieve availability unlike most other clusters which demands 3.
  2. Network partitioning which is one of the common problem in distributed system is completely evaded by ensuring single leader node at any given time as this leader election is conjoined with a primary key of a Table in the database.

Setting up Micro Integrator with Distributed Deployment

  1. Setup the database by executing the {db_type}_cluster.sql script in {MI_HOME}/dbscripts/{db_type} directory.
  2. Add the following entry to deployment.toml file found in {MI_HOME}/conf directory to enable server to communicate to the database.
TOML Configuration

3. Place the relevant database driver in {MI_HOME}/lib directory and start the server.

And that’s all about setting up!

Verifying the Deployment

Now, let’s verify our deployment by adding a scheduled task to ensure that it runs only in a single node and availability is guaranteed when the node which is running the task become unavailable.

Note: You will have to have at least two instances of Micro Integrator running at the same time with coordination enabled following the previous step to validate this.

Create a simple task with following config and deploy to the server.

Message Injector Task
Message Log Sequence

For more information on how to create a task via Integration Studio and deploy to Micro Integrator, refer official documentation from here.

The prevalence of the below given log in single node, proves that the task is running only in one instance.

Task Execution Message

To examine the availability, shutdown the server in which you observed the above log, and now, you would be able to view the above log in the other node, which means that task got switched to it and it ensures the availability.

What more to do with Task Coordination ?

Let’s find out what else we can fine tune with scheduling tasks in a distributed deployment.

Task Revolver: A functional unit which is responsible to divide the tasks among the available nodes in the cluster.

Distributing tasks equally in number to all the available instances

By default, Micro Integrator uses an active passive task resolving mechanism. That is, all the tasks will be scheduled in a single node and if it becomes unavailable only it will be moved to the others.

This behavior can be changed by altering the task resolver to use a Round Robin based implementation which distributes the tasks to nodes in a round robin fashion. To achieve this, add the following configuration to the deployment.toml file.

In here, the task_server_count is the number of servers which needs to be in the cluster before starting the distribution of tasks. This ensures that all the tasks are not scheduled in single node, before other nodes joins the cluster.

Running Tasks on Particular Node

Say, we have a scenario to run a task in a particular node only, then we will have to name the nodes and use Task Node Resolver implementation. This can be easily achieved by adding the following config to the deployment.toml file.

Running tasks in Particular Node

In the above config, we have given the present node with the name “node-1” and instructed that the tasks should run only in the nodes which has the name as “node-1” and “node-x” under task_resolver section. That is, if there are multiple nodes, only nodes with the id “node-1” and “node-x” will run the task and this node_id should be unique to a node.

Still want more other flexibility to Schedule Task ?

Well, in this case, it’s going to be more development work. You’ll have write your own task resolver class implementing TaskLocationResolver putting in your own task resolving logic and deploy to the server as a regular or osgi bundle. Following example implementations might help if you are doing so.

For further information, do visit Micro Integrator 4.0.0 Documentation.

and Thank you!

References

--

--