Introduction to Spring Batch

Rostyslav Ivankiv
6 min readNov 11, 2023

--

Introduction

Spring Batch[1] is a powerful framework for building batch-processing applications. It provides a robust and scalable solution for batch-processing applications.

In this article, we will explore the key concepts of Spring Batch.

Picture 1 — Key concepts of Spring Batch framework[2]

Simplified description of the components:

  • Spring Job Launcher: Starts and executes batch jobs.
  • Job Repository: Stores metadata about batch jobs, allowing for job restartability.
  • Step: Represents a unit of work within a batch job.
    - Chunk-Oriented Step: process data in chunks.
    - Tasklet Step: execute a single task.
  • Reader: Retrieves data from a source, like a file or database.
  • Processor: Transforms or processes the input data.
  • Writer: Writes the processed data to a destination, like a file or database.

Code

Let’s create a simple Java application to use the things from the picture above. The is going to read the CSV file from the URL, process it (some basic filtration) and then write to the PostgreSQL database.

As usual, we’ll start with pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.1.5</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>spring-batch-example</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring-batch-example</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>17</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.6.0</version>
</dependency>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>

As you see, we’ve added the spring boot starter for batch and PostgreSQL JDBC driver.

Our standard application entry point

@SpringBootApplication
public class SpringBatchExampleApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBatchExampleApplication.class, args);
}
}

Our application.properties

spring.datasource.username=postgresql
spring.datasource.password=postgresql
spring.datasource.url=jdbc:postgresql://localhost:5432/spring-batch-example
spring.sql.init.mode=always
spring.sql.init.schema-locations=classpath:db/create_app_tables.sql

spring.batch.jdbc.initialize-schema=always
spring.batch.jdbc.table-prefix=spring_batch.BATCH_
spring.batch.jdbc.schema=classpath:db/create_spring_batch_tables.sql

See create_app_tables.sql and create_spring_batch_tables.sql

CREATE TABLE IF NOT EXISTS people
(
id SERIAL PRIMARY KEY,
first_name VARCHAR,
last_name VARCHAR
);
CREATE SCHEMA IF NOT EXISTS SPRING_BATCH;

-- Autogenerated: do not edit this file

CREATE TABLE SPRING_BATCH.BATCH_JOB_INSTANCE
(
JOB_INSTANCE_ID BIGINT NOT NULL PRIMARY KEY,
VERSION BIGINT,
JOB_NAME VARCHAR(100) NOT NULL,
JOB_KEY VARCHAR(32) NOT NULL,
constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY)
);

CREATE TABLE SPRING_BATCH.BATCH_JOB_EXECUTION
(
JOB_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
VERSION BIGINT,
JOB_INSTANCE_ID BIGINT NOT NULL,
CREATE_TIME TIMESTAMP NOT NULL,
START_TIME TIMESTAMP DEFAULT NULL,
END_TIME TIMESTAMP DEFAULT NULL,
STATUS VARCHAR(10),
EXIT_CODE VARCHAR(2500),
EXIT_MESSAGE VARCHAR(2500),
LAST_UPDATED TIMESTAMP,
constraint JOB_INST_EXEC_FK foreign key (JOB_INSTANCE_ID)
references SPRING_BATCH.BATCH_JOB_INSTANCE (JOB_INSTANCE_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_JOB_EXECUTION_PARAMS
(
JOB_EXECUTION_ID BIGINT NOT NULL,
PARAMETER_NAME VARCHAR(100) NOT NULL,
PARAMETER_TYPE VARCHAR(100) NOT NULL,
PARAMETER_VALUE VARCHAR(2500),
IDENTIFYING CHAR(1) NOT NULL,
constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
references SPRING_BATCH.BATCH_JOB_EXECUTION (JOB_EXECUTION_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_STEP_EXECUTION
(
STEP_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
VERSION BIGINT NOT NULL,
STEP_NAME VARCHAR(100) NOT NULL,
JOB_EXECUTION_ID BIGINT NOT NULL,
CREATE_TIME TIMESTAMP NOT NULL,
START_TIME TIMESTAMP DEFAULT NULL,
END_TIME TIMESTAMP DEFAULT NULL,
STATUS VARCHAR(10),
COMMIT_COUNT BIGINT,
READ_COUNT BIGINT,
FILTER_COUNT BIGINT,
WRITE_COUNT BIGINT,
READ_SKIP_COUNT BIGINT,
WRITE_SKIP_COUNT BIGINT,
PROCESS_SKIP_COUNT BIGINT,
ROLLBACK_COUNT BIGINT,
EXIT_CODE VARCHAR(2500),
EXIT_MESSAGE VARCHAR(2500),
LAST_UPDATED TIMESTAMP,
constraint JOB_EXEC_STEP_FK foreign key (JOB_EXECUTION_ID)
references SPRING_BATCH.BATCH_JOB_EXECUTION (JOB_EXECUTION_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_STEP_EXECUTION_CONTEXT
(
STEP_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
SHORT_CONTEXT VARCHAR(2500) NOT NULL,
SERIALIZED_CONTEXT TEXT,
constraint STEP_EXEC_CTX_FK foreign key (STEP_EXECUTION_ID)
references SPRING_BATCH.BATCH_STEP_EXECUTION (STEP_EXECUTION_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_JOB_EXECUTION_CONTEXT
(
JOB_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
SHORT_CONTEXT VARCHAR(2500) NOT NULL,
SERIALIZED_CONTEXT TEXT,
constraint JOB_EXEC_CTX_FK foreign key (JOB_EXECUTION_ID)
references SPRING_BATCH.BATCH_JOB_EXECUTION (JOB_EXECUTION_ID)
);

CREATE SEQUENCE SPRING_BATCH.BATCH_STEP_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE SPRING_BATCH.BATCH_JOB_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE SPRING_BATCH.BATCH_JOB_SEQ MAXVALUE 9223372036854775807 NO CYCLE;

And data objects PersonCsv.java, PersonDb.java

public record PersonCsv(
String person_ID,
String name,
String first,
String last,
String middle,
String email,
String phone,
String fax,
String title
) {
}
public record PersonDb(String firstName, String lastName) {
}

The input file[3] to be read looks like this:

╔═══════════╦═══════════════════════╦═════════╦════════╦════════╦══════════════════╦═══════════════════╦══════════════╦═════════════════════╗
║ person_ID ║ name ║ first ║ last ║ middle ║ email ║ phone ║ fax ║ title ║
╠═══════════╬═══════════════════════╬═════════╬════════╬════════╬══════════════════╬═══════════════════╬══════════════╬═════════════════════╣
║ 3130 ║ "Burks, Rosella " ║ Rosella ║ Burks ║ ║ BurksR@univ.edu ║ 963.555.1253 ║ 963.777.4065 ║ Professor ║
║ 3297 ║ "Avila, Damien " ║ Damien ║ Avila ║ ║ AvilaD@univ.edu ║ 963.555.1352 ║ 963.777.7914 ║ Professor ║
║ 3547 ║ "Olsen, Robin " ║ Robin ║ Olsen ║ ║ OlsenR@univ.edu ║ 963.555.1378 ║ 963.777.9262 ║ Assistant Professor ║
║ 1538 ║ "Moises, Edgar Estes" ║ Edgar ║ Moises ║ Estes ║ MoisesE@univ.edu ║ 963.555.2731x3565 ║ 963.777.8264 ║ Professor ║
║ 2941 ║ "Brian, Heath Pruitt" ║ Heath ║ Brian ║ Pruitt ║ BrianH@univ.edu ║ 963.555.2800 ║ 963.777.7249 ║ Associate Curator ║
╚═══════════╩═══════════════════════╩═════════╩════════╩════════╩══════════════════╩═══════════════════╩══════════════╩═════════════════════╝

And finally here is our configuration with Jobs and Steps:

@Configuration
public class SpringBatchConfig {
private static final Logger logger = LoggerFactory.getLogger(SpringBatchConfig.class);

private final JobRepository jobRepository;
private final PlatformTransactionManager transactionManager;
private final DataSource dataSource;

public SpringBatchConfig(
final JobRepository jobRepository,
final PlatformTransactionManager transactionManager,
final DataSource dataSource
) {
this.jobRepository = jobRepository;
this.transactionManager = transactionManager;
this.dataSource = dataSource;
}

@Bean
public Job downloadCsvFileJob(Step downloadCsvFileStep) {
return new JobBuilder("downloadCsvFileJob", jobRepository)
.incrementer(new RunIdIncrementer())
.start(downloadCsvFileStep)
.build();
}

@Bean
public Step downloadCsvFileStep(Tasklet downloadCsvFileTasklet) {
return new StepBuilder("downloadCsvFileStep", jobRepository)
.tasklet(downloadCsvFileTasklet, transactionManager)
.build();
}

@Bean
@StepScope
public Tasklet downloadCsvFileTasklet(
@Value("#{jobParameters['sourceFileUrl']}") String sourceFileUrl,
@Value("#{jobParameters['targetFilePath']}") String targetFilePath
) throws MalformedURLException {
return new DownloadCsvFileTasklet(new URL(sourceFileUrl), Paths.get(targetFilePath));
}

@Bean
public Job loadCsvToDatabaseJob(Step loadCsvToDatabaseStep) {
return new JobBuilder("loadCsvToDatabaseJob", jobRepository)
.start(loadCsvToDatabaseStep)
.build();
}

@Bean
public Step loadCsvToDatabaseStep(
ItemReader<PersonCsv> reader,
ItemProcessor<PersonCsv, PersonDb> processor,
ItemWriter<PersonDb> writer
) {
return new StepBuilder("loadCsvToDatabaseStep", jobRepository)
.<PersonCsv, PersonDb>chunk(10, transactionManager)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}

@Bean
@StepScope
public FlatFileItemReader<PersonCsv> reader(
@Value("#{jobParameters['targetFilePath']}") FileSystemResource fileSystemResource
) {
return new FlatFileItemReaderBuilder<PersonCsv>()
.name("personItemReader")
.resource(fileSystemResource)
.linesToSkip(1)
.delimited()
.names("person_ID", "name", "first", "last", "middle", "email", "phone", "fax", "title")
.targetType(PersonCsv.class)
.build();
}

@Bean
public ItemProcessor<PersonCsv, PersonDb> processor() {
return personCsv -> {
if (personCsv.title().contains("Professor")) {
return null;
}
return new PersonDb(personCsv.first(), personCsv.last());
};
}

@Bean
public JdbcBatchItemWriter<PersonDb> writer() {
return new JdbcBatchItemWriterBuilder<PersonDb>()
.sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
.dataSource(dataSource)
.beanMapped()
.build();
}

private static class DownloadCsvFileTasklet implements Tasklet {
private final URL url;
private final Path path;

public DownloadCsvFileTasklet(final URL url, final Path path) {
this.url = url;
this.path = path;
}

@Override
public RepeatStatus execute(final StepContribution contribution, final ChunkContext chunkContext) {
downloadCsvFile(url, path);
return RepeatStatus.FINISHED;
}

private static void downloadCsvFile(final URL url, final Path path) {
try (InputStream in = url.openStream()) {
Files.copy(in, path, StandardCopyOption.REPLACE_EXISTING);
logger.info("File '{}' has been downloaded from '{}'", path, url);
} catch (IOException e) {
logger.error("Failed to get csv file", e);
}
}
}

}

How to run

We need to start DB in Docker

docker run --name spring-batch-postgres -p 5432:5432 -e POSTGRES_USER=postgresql -e POSTGRES_PASSWORD=postgresql -e POSTGRES_DB=spring-batch-example -d postgres:13

To run the code and launch 1st job let’s execute

./mvnw spring-boot:run \
-Dspring-boot.run.jvmArguments="-Dspring.batch.job.name=downloadCsvFileJob" \
-Dspring-boot.run.arguments="sourceFileUrl=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv targetFilePath=src/main/resources/data/people.csv"

The command above should launch the job named “downloadCsvFileJob”.
So we got the following logs:

INFO 224623 --- [main] o.s.b.a.b.JobLauncherApplicationRunner   : Running default command line with: [sourceFileUrl=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv, targetFilePath=src/main/resources/data/people.csv]
INFO 224623 --- [main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=downloadCsvFileJob]] launched with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}','sourceFileUrl':'{value=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv, type=class java.lang.String, identifying=true}','run.id':'{value=2, type=class java.lang.Long, identifying=true}'}]
INFO 224623 --- [main] o.s.batch.core.job.SimpleStepHandler : Executing step: [downloadCsvFileStep]
INFO 224623 --- [main] c.e.s.SpringBatchConfig : File 'src/main/resources/data/people.csv' has been downloaded from 'https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv'
INFO 224623 --- [main] o.s.batch.core.step.AbstractStep : Step: [downloadCsvFileStep] executed in 731ms
INFO 224623 --- [main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=downloadCsvFileJob]] completed with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}','sourceFileUrl':'{value=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv, type=class java.lang.String, identifying=true}','run.id':'{value=2, type=class java.lang.Long, identifying=true}'}] and the following status: [COMPLETED] in 777ms

To run 2nd job we will execute:

./mvnw spring-boot:run \
-Dspring-boot.run.jvmArguments="-Dspring.batch.job.name=loadCsvToDatabaseJob" \
-Dspring-boot.run.arguments="targetFilePath=src/main/resources/data/people.csv"

The job named “loadCsvToDatabaseJob” has been launched by the previous command. See logs:

INFO 224834 --- [main] o.s.b.a.b.JobLauncherApplicationRunner   : Running default command line with: [targetFilePath=src/main/resources/data/people.csv]
INFO 224834 --- [main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=loadCsvToDatabaseJob]] launched with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}'}]
INFO 224834 --- [main] o.s.batch.core.job.SimpleStepHandler : Executing step: [loadCsvToDatabaseStep]
INFO 224834 --- [main] o.s.batch.core.step.AbstractStep : Step: [loadCsvToDatabaseStep] executed in 156ms
INFO 224834 --- [main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=loadCsvToDatabaseJob]] completed with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}'}] and the following status: [COMPLETED] in 192ms

What is also quite interesting is that the spring batch writes data about jobs, steps, parameters, etc.

Picture 2 — Spring Batch tables

See what it looks like as the final result

Picture 3 — Final results

Conclusions

Batch processing is a powerful technique for handling large volumes of data efficiently. Spring Batch is a framework for JVM which provides a reliable way to create batch jobs by implementing commonly used processing patterns. By combining Spring Batch with Spring Boot and other components from the Spring ecosystem, you can develop high-stakes batch applications that are essential for critical business operations.

References

  1. https://spring.io/projects/spring-batch
  2. https://spring.io/batch
  3. https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv

--

--