Mastering Batch Processing Techniques in Mulesoft

Sri Uday Kumar Dhanala
6 min readSep 5, 2023

--

Batch Processing Overview

Batch processing allows for the efficient processing of records in groups or batches. This method capitalizes on parallel execution to optimize throughput.

Break it down in simple terms

Imagine you have a toy factory. You receive a big box of toy parts every day, and your job is to assemble them into finished toys.

Step 1 — Input: Every morning, you open the box and put all the toy parts on a conveyor belt. This is like MuleSoft taking in a large set of data or records to process.

Step 2 — Processing: As the toy parts move along the conveyor belt, there are stations where workers assemble them, paint them, and check for defects. Similarly, in MuleSoft’s batch processing, as each piece of data moves along, it goes through various steps — transforming the data, filtering it, or enriching it with additional information.

Step 3 — Output: At the end of the conveyor belt, the finished toys are packed into boxes and ready for shipment. Similarly, MuleSoft will output the processed data, maybe sending it to another system, saving it to a database, or generating a report.

Handling Defects: Sometimes, a toy part might be defective, and it’s removed from the conveyor belt. Similarly, if there’s an issue with a piece of data (maybe it’s missing some information or it’s in the wrong format), MuleSoft can catch that and handle it separately, ensuring that the rest of the data keeps moving smoothly.

In technical terms, MuleSoft’s batch processing allows for the efficient handling of large volumes of data by breaking them into manageable chunks, processing each chunk, and then either completing the process or handling any errors that might occur.

I hope this analogy helps simplify the concept of MuleSoft batch processing for you!

Key Applications of Batch Jobs:

  1. Real-time synchronization between systems, e.g., updating account details from an ERP to Salesforce.
  2. Employing ETL tools for file or database record processing.
  3. Bulk data processing.

Mechanics of Batch Processing in Mule 4:

When you introduce a Batch job activity in Mule 4, a flow for the batch job is automatically generated. Consequently, the batch job anticipates a payload either from the flow’s source section or from an invoking flow.

Batch Job Configuration:

Imagine a situation where 1,000 records undergo batch processing. Mule’s runtime would create Batch job instances based on a provided Batch Block Size. For instance, if 10 batch job instances each process blocks of 100 records, the total would be 1,000 records. Each Batch job instance processes according to the set Scheduling Strategy.

Configuration Parameters:

  • Name: Denotes the batch job name.
  • Max Failed Records: Defines the threshold for failed records before halting the batch job. By default, a value of 0 means halting upon a single failure. A value of -1 means the process won’t halt regardless of the number of failed records.
  • Scheduling Strategy: Determines batch job execution. Choices include:
  • ORDERED_SEQUENTIAL (default): Job instances run consecutively based on their timestamp.
  • ROUND_ROBIN: All available batch job instances execute using a round-robin algorithm.
  • Job Instance ID: Assigned to each batch job for processing.
  • Batch Block Size: Typically set to 100 by default. Defines the number of records assigned to each execution thread.
  • Max Concurrency: Specifies the number of threads involved in batch processing.
  • Target: Intended to store batch statistics.

Batch Job Sections:

  1. Process Records: This section can have multiple batch steps where each step contains processors for handling individual records. Processors can range from enrichment, transformation, to routing.
  2. On Complete: This optional section executes once all records are processed. It primarily provides insights into the batch processing, such as the number of records processed, failed, or successfully completed.

Internal Mechanism (Within Mule Runtime):

Batch jobs undergo three primary phases:

  1. Load and Dispatch (Phase): Mule prepares for batch job processing. During this phase, batch job instances and associated persistent queues are created.
  2. Process (Phase): Actual processing begins, with records processed asynchronously. Each record’s status is monitored, with failures marked appropriately.
  3. On Complete (Phase): An optional phase that provides a summary of the batch processing. The output typically contains details such as the total number of records, processing time, and failure count.

Variable Management in Mule 4 Batch Processing

In Mule 4’s batch processing framework, variables declared within a batch step are intrinsically linked to the record currently undergoing processing. These variables retain their values and are accessible across all subsequent batch steps. This ensures that any variable initialized or modified in one step can be leveraged in all following steps.

The table below illustrates this concept: when a variable is set or updated in a particular batch step, its value remains consistent and accessible throughout all subsequent batch steps.

Error Handling in Batch Jobs

When working with batch jobs, understanding error handling is paramount. It is essential to note that any errors that occur inside a batch job will not be propagated to the main flow error handler. However, if an error occurs in the Source section, it will be routed to the main flow error handler.

Key Points:

  1. Errors in the Batch Job: Errors originating from the “Process Records” and “On Complete” sections of a batch job won’t be forwarded to the main flow error handler. This is to ensure that the main flow remains unaffected by any issues within the batch job processing.
  2. Configuration of Batch Job: It’s crucial to appropriately set the “Max Failed Records” value during configuration:
  • 0: Processing halts immediately upon encountering a failed record.
  • -1: Processing continues indefinitely, regardless of the number of failed records.
  • Specific Integer: Processing continues until the specified maximum number of failed records is reached.
  1. Error Handling Options within Batch Job: Depending on the business requirements, there are various error-handling strategies that can be employed:
  • Utilizing Try-Catch in Batch Steps: This approach captures the exact error trace for a record, facilitating tailored error handling. Post error capture, two main scenarios can unfold:
  • On Error Continue: Select this if the intention is to absorb the error and proceed post the try-catch segment. In this scenario:
  1. An error is raised.
  2. The try-catch logger records the error.
  3. Execution resumes post the try-catch segment.
  4. The “Batch_Step_Failed_Records” segment isn’t invoked since the record isn’t flagged as an error.
  • On Error Propagate: Opt for this if the aim is to escalate the error to the batch job level, thereby halting processing for the problematic record. The sequence in this case is:
  1. An error is raised.
  2. The try-catch logger logs the error.
  3. The “Batch_Step_Failed_Records” segment is activated, as the record is flagged as an error.

when handling errors in batch jobs, it’s essential to strike a balance between robust error capture and ensuring smooth batch processing. Tailoring the error handling mechanism based on the specific needs of the batch job is the key to maintaining system integrity while ensuring efficient processing.

Insights on Batch Processing:

While batch jobs are renowned for parallel processing in MuleSoft, they aren’t the only tools available. Other functionalities like Scatter-Gather, Async, and For-Each Parallel also facilitate parallel execution, each tailored for specific use cases.

--

--

Sri Uday Kumar Dhanala

I write to educate, inspire, and connect with diverse readers. Dive into my articles to experience a blend of expertise, innovation, and the art of teaching.