Introduction to Spring Batch. Take 1

  1. What is a batch process?

A batch process is an automated task for processing a great data volume. This process could be run without human interaction and could be repeated periodically.

What a batch process is not: It is not a scheduled task (a cron). It is a common case to schedule a batch process but it is not necessary.

2. With a batch process: What things should we care?

  • Transactionality, because we want to roll back data which has been invalidated.
  • Fault tolerance, because we want the application not to exit when an exception occurs.
  • Retrials, because sometimes shit happens.
  • Logs and statistics, because we need to know what happens inside the batch process from times to times.
  • Stopping and rerunning of batch processes.
  • Web administration, because it is cool.
  • Partitioning, because the work can be shared with different machines.

Each one of these concerns are going to be commented below. We have prepared some commits in a repo to go with these bullets noticing the differences in code.

3. Spring Batch concepts

Spring Batch is a project which provides us a framework to develop batch processing applications. This project has a long run and it is the result of large enterprises contribution with a wide experience in batch processing.

Jobs, Steps, ItemReader, ItemProcessor and ItemWriter:

Source: Spring Batch Reference Documentation https://docs.spring.io/spring-batch/trunk/reference/html/domain.html

We can run jobs in Spring Batch. Every job has a number of steps and every step has a component to get the items (ItemReader), a component to process the items (ItemProcessor) and a component to write (ItemWriter). The processing component is optional.

Generally, we need a JobBuilderFactory instance to declare a Job and we need a StepBuilderFactory instance to declare a Step. No problem, Spring gives them us.

JobExecution and JobLauncher

A job can be launched by a JobLauncher instance. During the execution, information can be stored and shared in a JobExecutionContext and in a StepExecutionContext.

JobLauncher returns a JobExecution instance which gives us some info about the execution as the resulting state: COMPLETED, FAILED…

JobInstance, JobParameters and RunIdIncrementer

A JobInstance is the combination of a Job and JobParameters. One the Spring Batch set of rules is that we cannot rerun a Job if a JobExecution for this job has been COMPLETED. A RunIdIncrementer instance can be created to run the same job several times because it changes the job parameters internally.

JobRepository

Spring Batch completely manages a database with information about the jobs execution instantiating a JobRepository bean. We only need to declare a H2 dependency in the development environment.

4. Minimum necessary to run Spring Batch

  • Spring Boot Starter Batch dependency is where the magic lives.
  • Database driver dependency
  • @EnableBatchProcessing annotation over a Spring configuration class.
  • Job bean definition
  • Step bean definition with a task of printing “Reading…”. As the task returns null, this is intended as the end of the items source so the job finishes.

You can clone this demo repository we have prepared for this post and see the code we have added in the first commit to run Spring Batch with the minimum necessary.

This should be enough for now as are preparing the next entry in this series of Introduction to Spring Batch. Stay tuned.