Step scoped beans in a Spring Batch Job

Safina
Javarevisited
Published in
6 min readApr 4, 2021

The concept of the scope of a bean in spring enables us to focus on business logic without having to worry about data inconsistency.

The lifespan of beans in a spring application are defined with respect to the application context(@Singleton), thread (@Prototype), or incase of web-aware applications http session (@Session), http request (@Request) or servlet lifecycle (@ApplicationScope), webSocket session.

In other words, a bean annotated with @Singelton will be created once when the container is initialized (Application is started) and is destroyed when the container is terminated (Application is shut down).

Similarly, a bean annotated with @prototype is created every time there is a new request for the bean. Bean annotated with @Session is created once per session and a bean annotated with @Request is created for every http request.

As can be seen, we have a large bevy of bean scopes to choose from for every custom application need. We have specific scopes pertaining to web applications, we have scopes pertaining to application contexts.

So far so good. But in the case of a spring batch job, the above-mentioned scopes can be used only up to a certain extent. As a SpringBatch Job comprises of steps thus it is only natural and practical to define beans with respect to a step in addition to be the above-mentioned scopes.

Luckily for us, Spring defines two more scopes namely step scope and job scope that help us in defining beans whose lifecycles are tied to the lifecycle of a job and a step respectively.

Step Scope

LifeSpan of step scoped beans is tied to the lifecycle of a step i.e beans are created and destroyed at the beginning and at the end of a step respectively.

The annotation used to declare a step scoped bean is @StepScope

Step scope is especially useful when we are executing steps in parallel as it then becomes essential to isolate the state of the bean. Failing this, multiple threads would modify the state of the bean simultaneously rendering it inconsistent.

In the example below, we have configured a SpringBatch job where each step performs the same action which is to read from a database, do some transformation on the data, and write it to a file.

The idea here is to read data from a database, process it and move it to a file. Once it is successfully processed and written to a file we erase all data from the database.

The above-mentioned sequence is constant across steps with the only difference being the database (one db configuration per step) that each step reads from and then consequently delets all data from.

This job contains three steps step1, step2, step3 . Step 1 and 2 are configured to run in parallel while step 3 executes after step1 and 2 complete execution.

The read, process, write part is handled by the step but what about the additional data deletion aspect?

It is a generic action that must be performed after each step completes execution.

Spring provides us with something called a step listener which does exactly that. When a listener is registered with a step it is called upon by the framework after the step runs successfully and before moving on to the next step.

We have defined a generic step listener class that would handle the data deletion part of our process.

It takes parameters such as dbname,dbUserName etc in the constructor and then deletes all the data in that database once the afterStep method of the listener is invoked by the framework after step completion.

Defining a generic listener enables us to use the same listener class with each step. We just have to ensure that three different instances with three different configurations are used for each step

We need the listener to exist within the lifecycle of the step and for it to have a different configuration for each step. This can be only be achieved by using step scope.

StepScope would ensure we isolate the processing of the listener for each step and furthermore the listener instance ceases to exist once the step completes execution thus avoiding any memory leakage.

Stepscope also provides us the ability to

· pass information between steps through the step execution context and stepExecutionListener

· perform late binding of parameters by annotating them with placeholders

Consider the below scenario where each needs to pass the number of records it processed to the next consecutive step . We have three steps step1,step2,step3. Step 1 should pass the total number of records it processed to step2 and so on.

The problem arises when we have to pass the count information from one step to another. This is where the stepexecutionListener comes into picture yet again.

As mentioned above when a listener is registered with a step the afterStep method of the listener is invoked once the step completes execution. Listener has two methods BeforeStep and AfterStep each of which is invoked before the step starts execution and after it finishes execution respectively.

Thus the listener helps us bridge the gap between steps by interminently storing the count information between steps.

The steps now only have to ensure that they pass this information to the listener before terminating . How do we pass this data from the step to the listener?

Fortunately we have something called the stepExecutionContext which usually contain context information pertaining to the step. The writer of each step sets the count value in stepExecutionContext and listener retrieves it from the context.

Here we are using a placeholder to auto-populate the count variable. Initially this value is 0 and the writer of step1 adds the number it just processed to it. So if step 1 has read 10 records count is updated to 10.(0+10)

The placeholder allows us to retrieve the count of the records processed by all preceding steps. The placeholder is resolved at runtime by retrieving the count value from the stepExecutionContext.

However, the lifespan of stepExecutionContext is tied to a step. What is saved in the stepExecutionContext of step1 is not visible in the stepExecutionContext of step2.

You must have noticed that our listener this time around is not stepscoped but rather job scoped. This means only one instance of the listener is created per job invocation. Thus the same instance of the listener is shared among all the steps of a job.

We leverage this to store the current count in the listener intermittently and pass it between steps.

We retrieve the count from stepExecutionContext before exiting the step (through the AfterStep method in the listener) and update the currentCount of the stepExecutionListener to reflect the total number of records processed so far.

We earlier mentioned that the initial count value is set to 0 before step1 begins processing. But who sets it? This is done in the beforeStep method of the listener which checks if the current step is the first step and sets it to 0 if true.

This is how step scope helps us in not only defining the lifespan of our beans but also allows us to pass data across steps

P.S: You can find a working example here https://github.com/SafinaAh/SpringBatchExamples . This does not have everything discussed in this article but will be updated as and when I can squeeze in some time to incorporate everything that was discussed here.

--

--

Safina
Javarevisited

Smart. Strong. Silly. First I drink coffee. Then I do things. Hodophile. Adrenaline junkie :)