Remote Chunking with Spring Batch (Integration) [Part-1]

Bayonne Sensei
2 min readDec 18, 2022

Remote chunking is a manager/worker approach which allows Spring Batch developers to scale batch applications.

manager — workers

In remote chunking, the Step processing is split across multiple processes, communicating with each other through some messaging middleware.

When to use?

Developers can use Remote Chunking when the bottleneck is Processing or Writing the data.

How does it work?

manager and workers communicate each other using messaging system

The manager component is a single process, and the workers are multiple remote processes.

  • The manager is responsible for reading the data (ie: read a Flat file, export data from DB, from Kafka, etc).
  • The manager sends the data (ChunkRequest<T>) to a messaging middleware system (like Kafka, /rabbit MQ, Google Pub-Sub, etc).
  • The worker(s) consume the data (ChunkRequest<T>) from messaging middleware system, process the data and write to target destination (ie: save into database, file, REST endpoint, etc).
  • Once the worker finishes the processing of the data, it sends the response (ChunkResponse<T>) of the execution to messaging system.
  • The manager consumes the response (ChunkResponse<T>) and computes the result of job execution.
chunk requests and replies

Part two is here

Conclusion:

Remote chunking is a good pattern which can be used to scale spring batch application, it uses a messaging system allowing developers to scale out the workers with many instances (multiple instances of the same application). For example, we can use Apache Kafka as middleware messaging system, start up our worker app with three instances, adding three (+) Kafka partitions in our topic will allow all instances working at the same time (use consumer groups).

Follow the full video series on my youtube channel

Part two is here

--

--