Introduction to Spring Batch

6 min readNov 11, 2023

Introduction

Spring Batch[1] is a powerful framework for building batch-processing applications. It provides a robust and scalable solution for batch-processing applications.

In this article, we will explore the key concepts of Spring Batch.

Picture 1 — Key concepts of Spring Batch framework[2]

Simplified description of the components:

Spring Job Launcher: Starts and executes batch jobs.
Job Repository: Stores metadata about batch jobs, allowing for job restartability.
Step: Represents a unit of work within a batch job.
- Chunk-Oriented Step: process data in chunks.
- Tasklet Step: execute a single task.
Reader: Retrieves data from a source, like a file or database.
Processor: Transforms or processes the input data.
Writer: Writes the processed data to a destination, like a file or database.

Code

Let’s create a simple Java application to use the things from the picture above. The is going to read the CSV file from the URL, process it (some basic filtration) and then write to the PostgreSQL database.

As usual, we’ll start with pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.1.5</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.example</groupId>
    <artifactId>spring-batch-example</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>spring-batch-example</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>17</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>42.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

As you see, we’ve added the spring boot starter for batch and PostgreSQL JDBC driver.

Our standard application entry point

@SpringBootApplication
public class SpringBatchExampleApplication {
    public static void main(String[] args) {
        SpringApplication.run(SpringBatchExampleApplication.class, args);
    }
}

Our application.properties

spring.datasource.username=postgresql
spring.datasource.password=postgresql
spring.datasource.url=jdbc:postgresql://localhost:5432/spring-batch-example
spring.sql.init.mode=always
spring.sql.init.schema-locations=classpath:db/create_app_tables.sql

spring.batch.jdbc.initialize-schema=always
spring.batch.jdbc.table-prefix=spring_batch.BATCH_
spring.batch.jdbc.schema=classpath:db/create_spring_batch_tables.sql

See create_app_tables.sql and create_spring_batch_tables.sql

CREATE TABLE IF NOT EXISTS people
(
    id         SERIAL PRIMARY KEY,
    first_name VARCHAR,
    last_name  VARCHAR
);

CREATE SCHEMA IF NOT EXISTS SPRING_BATCH;

-- Autogenerated: do not edit this file

CREATE TABLE SPRING_BATCH.BATCH_JOB_INSTANCE
(
    JOB_INSTANCE_ID BIGINT       NOT NULL PRIMARY KEY,
    VERSION         BIGINT,
    JOB_NAME        VARCHAR(100) NOT NULL,
    JOB_KEY         VARCHAR(32)  NOT NULL,
    constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY)
);

CREATE TABLE SPRING_BATCH.BATCH_JOB_EXECUTION
(
    JOB_EXECUTION_ID BIGINT    NOT NULL PRIMARY KEY,
    VERSION          BIGINT,
    JOB_INSTANCE_ID  BIGINT    NOT NULL,
    CREATE_TIME      TIMESTAMP NOT NULL,
    START_TIME       TIMESTAMP DEFAULT NULL,
    END_TIME         TIMESTAMP DEFAULT NULL,
    STATUS           VARCHAR(10),
    EXIT_CODE        VARCHAR(2500),
    EXIT_MESSAGE     VARCHAR(2500),
    LAST_UPDATED     TIMESTAMP,
    constraint JOB_INST_EXEC_FK foreign key (JOB_INSTANCE_ID)
        references SPRING_BATCH.BATCH_JOB_INSTANCE (JOB_INSTANCE_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_JOB_EXECUTION_PARAMS
(
    JOB_EXECUTION_ID BIGINT       NOT NULL,
    PARAMETER_NAME   VARCHAR(100) NOT NULL,
    PARAMETER_TYPE   VARCHAR(100) NOT NULL,
    PARAMETER_VALUE  VARCHAR(2500),
    IDENTIFYING      CHAR(1)      NOT NULL,
    constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
        references SPRING_BATCH.BATCH_JOB_EXECUTION (JOB_EXECUTION_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_STEP_EXECUTION
(
    STEP_EXECUTION_ID  BIGINT       NOT NULL PRIMARY KEY,
    VERSION            BIGINT       NOT NULL,
    STEP_NAME          VARCHAR(100) NOT NULL,
    JOB_EXECUTION_ID   BIGINT       NOT NULL,
    CREATE_TIME        TIMESTAMP    NOT NULL,
    START_TIME         TIMESTAMP DEFAULT NULL,
    END_TIME           TIMESTAMP DEFAULT NULL,
    STATUS             VARCHAR(10),
    COMMIT_COUNT       BIGINT,
    READ_COUNT         BIGINT,
    FILTER_COUNT       BIGINT,
    WRITE_COUNT        BIGINT,
    READ_SKIP_COUNT    BIGINT,
    WRITE_SKIP_COUNT   BIGINT,
    PROCESS_SKIP_COUNT BIGINT,
    ROLLBACK_COUNT     BIGINT,
    EXIT_CODE          VARCHAR(2500),
    EXIT_MESSAGE       VARCHAR(2500),
    LAST_UPDATED       TIMESTAMP,
    constraint JOB_EXEC_STEP_FK foreign key (JOB_EXECUTION_ID)
        references SPRING_BATCH.BATCH_JOB_EXECUTION (JOB_EXECUTION_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_STEP_EXECUTION_CONTEXT
(
    STEP_EXECUTION_ID  BIGINT        NOT NULL PRIMARY KEY,
    SHORT_CONTEXT      VARCHAR(2500) NOT NULL,
    SERIALIZED_CONTEXT TEXT,
    constraint STEP_EXEC_CTX_FK foreign key (STEP_EXECUTION_ID)
        references SPRING_BATCH.BATCH_STEP_EXECUTION (STEP_EXECUTION_ID)
);

CREATE TABLE SPRING_BATCH.BATCH_JOB_EXECUTION_CONTEXT
(
    JOB_EXECUTION_ID   BIGINT        NOT NULL PRIMARY KEY,
    SHORT_CONTEXT      VARCHAR(2500) NOT NULL,
    SERIALIZED_CONTEXT TEXT,
    constraint JOB_EXEC_CTX_FK foreign key (JOB_EXECUTION_ID)
        references SPRING_BATCH.BATCH_JOB_EXECUTION (JOB_EXECUTION_ID)
);

CREATE SEQUENCE SPRING_BATCH.BATCH_STEP_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE SPRING_BATCH.BATCH_JOB_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE SPRING_BATCH.BATCH_JOB_SEQ MAXVALUE 9223372036854775807 NO CYCLE;

And data objects PersonCsv.java, PersonDb.java

public record PersonCsv(
    String person_ID,
    String name,
    String first,
    String last,
    String middle,
    String email,
    String phone,
    String fax,
    String title
) {
}

public record PersonDb(String firstName, String lastName) {
}

The input file[3] to be read looks like this:

╔═══════════╦═══════════════════════╦═════════╦════════╦════════╦══════════════════╦═══════════════════╦══════════════╦═════════════════════╗
║ person_ID ║ name                  ║ first   ║ last   ║ middle ║ email            ║ phone             ║ fax          ║ title               ║
╠═══════════╬═══════════════════════╬═════════╬════════╬════════╬══════════════════╬═══════════════════╬══════════════╬═════════════════════╣
║ 3130      ║ "Burks, Rosella "     ║ Rosella ║ Burks  ║        ║ BurksR@univ.edu  ║ 963.555.1253      ║ 963.777.4065 ║ Professor           ║
║ 3297      ║ "Avila, Damien "      ║ Damien  ║ Avila  ║        ║ AvilaD@univ.edu  ║ 963.555.1352      ║ 963.777.7914 ║ Professor           ║
║ 3547      ║ "Olsen, Robin "       ║ Robin   ║ Olsen  ║        ║ OlsenR@univ.edu  ║ 963.555.1378      ║ 963.777.9262 ║ Assistant Professor ║
║ 1538      ║ "Moises, Edgar Estes" ║ Edgar   ║ Moises ║ Estes  ║ MoisesE@univ.edu ║ 963.555.2731x3565 ║ 963.777.8264 ║ Professor           ║
║ 2941      ║ "Brian, Heath Pruitt" ║ Heath   ║ Brian  ║ Pruitt ║ BrianH@univ.edu  ║ 963.555.2800      ║ 963.777.7249 ║ Associate Curator   ║
╚═══════════╩═══════════════════════╩═════════╩════════╩════════╩══════════════════╩═══════════════════╩══════════════╩═════════════════════╝

And finally here is our configuration with Jobs and Steps:

@Configuration
public class SpringBatchConfig {
    private static final Logger logger = LoggerFactory.getLogger(SpringBatchConfig.class);

    private final JobRepository jobRepository;
    private final PlatformTransactionManager transactionManager;
    private final DataSource dataSource;

    public SpringBatchConfig(
        final JobRepository jobRepository,
        final PlatformTransactionManager transactionManager,
        final DataSource dataSource
    ) {
        this.jobRepository = jobRepository;
        this.transactionManager = transactionManager;
        this.dataSource = dataSource;
    }

    @Bean
    public Job downloadCsvFileJob(Step downloadCsvFileStep) {
        return new JobBuilder("downloadCsvFileJob", jobRepository)
            .incrementer(new RunIdIncrementer())
            .start(downloadCsvFileStep)
            .build();
    }

    @Bean
    public Step downloadCsvFileStep(Tasklet downloadCsvFileTasklet) {
        return new StepBuilder("downloadCsvFileStep", jobRepository)
            .tasklet(downloadCsvFileTasklet, transactionManager)
            .build();
    }

    @Bean
    @StepScope
    public Tasklet downloadCsvFileTasklet(
        @Value("#{jobParameters['sourceFileUrl']}") String sourceFileUrl,
        @Value("#{jobParameters['targetFilePath']}") String targetFilePath
    ) throws MalformedURLException {
        return new DownloadCsvFileTasklet(new URL(sourceFileUrl), Paths.get(targetFilePath));
    }

    @Bean
    public Job loadCsvToDatabaseJob(Step loadCsvToDatabaseStep) {
        return new JobBuilder("loadCsvToDatabaseJob", jobRepository)
            .start(loadCsvToDatabaseStep)
            .build();
    }

    @Bean
    public Step loadCsvToDatabaseStep(
        ItemReader<PersonCsv> reader,
        ItemProcessor<PersonCsv, PersonDb> processor,
        ItemWriter<PersonDb> writer
    ) {
        return new StepBuilder("loadCsvToDatabaseStep", jobRepository)
            .<PersonCsv, PersonDb>chunk(10, transactionManager)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .build();
    }

    @Bean
    @StepScope
    public FlatFileItemReader<PersonCsv> reader(
        @Value("#{jobParameters['targetFilePath']}") FileSystemResource fileSystemResource
    ) {
        return new FlatFileItemReaderBuilder<PersonCsv>()
            .name("personItemReader")
            .resource(fileSystemResource)
            .linesToSkip(1)
            .delimited()
            .names("person_ID", "name", "first", "last", "middle", "email", "phone", "fax", "title")
            .targetType(PersonCsv.class)
            .build();
    }

    @Bean
    public ItemProcessor<PersonCsv, PersonDb> processor() {
        return personCsv -> {
            if (personCsv.title().contains("Professor")) {
                return null;
            }
            return new PersonDb(personCsv.first(), personCsv.last());
        };
    }

    @Bean
    public JdbcBatchItemWriter<PersonDb> writer() {
        return new JdbcBatchItemWriterBuilder<PersonDb>()
            .sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
            .dataSource(dataSource)
            .beanMapped()
            .build();
    }

    private static class DownloadCsvFileTasklet implements Tasklet {
        private final URL url;
        private final Path path;

        public DownloadCsvFileTasklet(final URL url, final Path path) {
            this.url = url;
            this.path = path;
        }

        @Override
        public RepeatStatus execute(final StepContribution contribution, final ChunkContext chunkContext) {
            downloadCsvFile(url, path);
            return RepeatStatus.FINISHED;
        }

        private static void downloadCsvFile(final URL url, final Path path) {
            try (InputStream in = url.openStream()) {
                Files.copy(in, path, StandardCopyOption.REPLACE_EXISTING);
                logger.info("File '{}' has been downloaded from '{}'", path, url);
            } catch (IOException e) {
                logger.error("Failed to get csv file", e);
            }
        }
    }

}

How to run

We need to start DB in Docker

docker run --name spring-batch-postgres -p 5432:5432 -e POSTGRES_USER=postgresql -e POSTGRES_PASSWORD=postgresql -e POSTGRES_DB=spring-batch-example -d postgres:13

To run the code and launch 1st job let’s execute

./mvnw spring-boot:run \
-Dspring-boot.run.jvmArguments="-Dspring.batch.job.name=downloadCsvFileJob" \
-Dspring-boot.run.arguments="sourceFileUrl=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv targetFilePath=src/main/resources/data/people.csv"

The command above should launch the job named “downloadCsvFileJob”.
So we got the following logs:

INFO 224623 --- [main] o.s.b.a.b.JobLauncherApplicationRunner   : Running default command line with: [sourceFileUrl=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv, targetFilePath=src/main/resources/data/people.csv]
INFO 224623 --- [main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=downloadCsvFileJob]] launched with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}','sourceFileUrl':'{value=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv, type=class java.lang.String, identifying=true}','run.id':'{value=2, type=class java.lang.Long, identifying=true}'}]
INFO 224623 --- [main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [downloadCsvFileStep]
INFO 224623 --- [main] c.e.s.SpringBatchConfig                  : File 'src/main/resources/data/people.csv' has been downloaded from 'https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv'
INFO 224623 --- [main] o.s.batch.core.step.AbstractStep         : Step: [downloadCsvFileStep] executed in 731ms
INFO 224623 --- [main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=downloadCsvFileJob]] completed with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}','sourceFileUrl':'{value=https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv, type=class java.lang.String, identifying=true}','run.id':'{value=2, type=class java.lang.Long, identifying=true}'}] and the following status: [COMPLETED] in 777ms

To run 2nd job we will execute:

./mvnw spring-boot:run \
-Dspring-boot.run.jvmArguments="-Dspring.batch.job.name=loadCsvToDatabaseJob" \
-Dspring-boot.run.arguments="targetFilePath=src/main/resources/data/people.csv"

The job named “loadCsvToDatabaseJob” has been launched by the previous command. See logs:

INFO 224834 --- [main] o.s.b.a.b.JobLauncherApplicationRunner   : Running default command line with: [targetFilePath=src/main/resources/data/people.csv]
INFO 224834 --- [main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=loadCsvToDatabaseJob]] launched with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}'}]
INFO 224834 --- [main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [loadCsvToDatabaseStep]
INFO 224834 --- [main] o.s.batch.core.step.AbstractStep         : Step: [loadCsvToDatabaseStep] executed in 156ms
INFO 224834 --- [main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=loadCsvToDatabaseJob]] completed with the following parameters: [{'targetFilePath':'{value=src/main/resources/data/people.csv, type=class java.lang.String, identifying=true}'}] and the following status: [COMPLETED] in 192ms

What is also quite interesting is that the spring batch writes data about jobs, steps, parameters, etc.

See what it looks like as the final result

Conclusions

Batch processing is a powerful technique for handling large volumes of data efficiently. Spring Batch is a framework for JVM which provides a reliable way to create batch jobs by implementing commonly used processing patterns. By combining Spring Batch with Spring Boot and other components from the Spring ecosystem, you can develop high-stakes batch applications that are essential for critical business operations.

Introduction to Spring Batch

Introduction

Code

How to run

Conclusions

References

Written by Rostyslav Ivankiv