Spring Batch: retrieve data from the CSV file and save it to Database H2

8 min readMay 10, 2023

Spring Batch is a lightweight, open-source framework that provides a comprehensive set of APIs and tools for building batch-processing applications in Java. It is built on top of the Spring Framework and follows the same programming model and design principles.

Batch processing is a common requirement in many business applications where large volumes of data need to be processed periodically or on demand. Batch processing involves reading data from various sources, processing it in a batch, and writing the processed data to other destinations. This type of processing is different from real-time or online processing where data is processed continuously and results are immediately available.

Spring Batch provides a wide range of features that make it easy to develop and manage batch processing applications, such as:

Job execution and scheduling: Spring Batch provides a job launcher that can execute batch jobs on demand or on a predefined schedule.
Parallel processing: Spring Batch supports parallel processing of batch jobs, allowing you to process large volumes of data in a shorter amount of time.
Chunk-based processing: Spring Batch allows you to process data in chunks, which means reading a large volume of data in smaller chunks and processing them one by one.
Item readers and writers: Spring Batch provides a set of predefined item readers and writers that make it easy to read and write data from various sources and destinations.
Transaction management: Spring Batch provides transaction management features that allow you to ensure data integrity and consistency during batch processing.
Restartability: Spring Batch supports job restartability, allowing you to restart a failed or interrupted job from the point of failure.

Overall, Spring Batch provides a powerful and flexible framework for building batch processing applications in Java. It simplifies the development process, reduces the amount of boilerplate code, and provides a wide range of features for handling different batch processing scenarios.

The purpose of this lesson is not to cover the spring batch entirely. But to show you how to create a spring batch project from scratch.

Here are some points you will master at the end of this.

We’re going to talk about some concepts of Spring Batch.

What are the Jobs and the Steps?

What are the components of a Step?

Create a Spring Batch project from zero;
Create a Step with a Reader, Processor, and Writer;
Create a Job with one Step;
Generate Table “BATCH_JOB_INSTANCE” if this one is not done automatically;
Run a Job when calling the endpoint of the application;

Component of Spring Batch: Step, Job, Tasklet Job Launcher

Spring Batch Components and Architecture.

Configure tasks to be executed at different times, in parallel, or when calling a service, such as an endpoint. These jobs will be composed of one or more steps. These jobs will in turn be composed of one step or several steps. Generally, these jobs are connected, for example, let’s consider a job, which has the purpose to update the data of a user, this job will have the following steps, import user, delete user, and Modify user.

In this approach, the processing logic is divided into multiple steps, and each step is executed one after another in a predefined order. The input data is read at the beginning of the process, and the output data is written at the end of the process.

The linear processing approach is straightforward and easy to understand. It is suitable for scenarios where the processing logic is simple, and the data volume is not too large. It also provides a clear view of the data flow and the processing steps, making it easier to debug and troubleshoot any issues that may arise during processing.

However, the linear processing approach may not be suitable for complex processing scenarios that involve multiple data sources, transformations, and outputs. In these scenarios, it may be challenging to design the processing logic as a sequence of steps, and the processing time may increase significantly.

For those who have an idea of electricity in secondary school, in the case of a series of lamps, when one of them fails, the others don’t light anymore

To address these limitations, batch processing frameworks such as Spring Batch provide more advanced processing techniques, such as parallel processing, chunk-based processing, and partitioning. These techniques allow processing to be performed in parallel, allowing for faster processing of large volumes of data. They also allow for more complex processing logic to be executed, such as data transformations, aggregations, and filtering.

retrieve data from the CSV file and save it to Database H2

Let us create a real example of a spring batch project.

Let’s consider the situation where everyone who visits a company is registered at reception and this data is saved in an Excel file (CSV). We want to save this data in a database to better secure it and also to perform other processing.

We are going to use :
Java 17
Spring boot 3.0.6
Spring batch 5.0.1
IntelliJ you can use Spring initializer to generate your project too.

Now let’s go, open IntelliJ and create your project.

Next and you have to choose the dependencies.

This is to show you the dependencies of the project: Spring Batch, Spring Web, JPA, database H2, Lambok & DevTools, and Devtools.

We have selected 6 dependencies there.

Let’s start with the I/O section to choose Spring Batch, which is the subject of this mini-project.

In the Web category, we have chosen Spring Web. We will use Spring MVC to create our RestController. Which will be used to consult the data, the results of the operations or to trigger the job.

In the SQL part, we have JPA that will be used to record in our relational database. For our mini project, we will choose database H2.

Under the heading Developer tools, we select Lambok & DevTools, Devtools will allow us thanks to its web interface to consult our H2 database.

As soon as the project is created, let’s compile it to make sure that everything works well.

Make sure that the project is running perfectly.

I organize my packages as shown bellow

Create your dto visitors.java as, the code is bellow

@Entity
@Data
@NoArgsConstructor
@AllArgsConstructor
@ToString
public class Visitors {

    @Id
    private Long id;
    private String firstName;
    private String lastName;
    private String emailAddress;
    private String phoneNumber;
    private String address;
    private Date visitDate;

    @Transient
    private String strVisitDate;

}

with which you will map the data from the CSV file.

visitor_id,first_name,last_name,email_address,phone_number,address,visit_date
12345,John,Smith,john.smith@example.com,555-1234,123 Main St.,05/05/2023-09:44
23456,Jane,Doe,jane.doe@example.com,555-5678,456 Elm St.,06/05/2023-10:44-18:13
34567,Robert,Johnson,robert.johnson@example.com,555-9012,789 Oak St.,07/05/2023-09:44
45678,Sarah,Lee,sarah.lee@example.com,555-3456,987 Maple St.,08/05/2023-13:15
56789,David,Wang,david.wang@example.com,555-7890,654 Birch St.,09/05/2023-18:13
67890,Emily,Chen,emily.chen@example.com,555-2345,321 Pine St.,10/05/2023-18:13
78901,Michael,Lin,michael.lin@example.com,555-6789,654 Cedar St.,13/05/2023-18:13
89012,Stephanie,Tan,stephanie.tan@example.com,555-1234,987 Spruce St.,14/05/2023-13:15
90123,William,Chen,william.chen@example.com,555-5678,123 Poplar St.,15/05/2023-18:13
01234,Amanda,Liu,amanda.liu@example.com,555-9012,456 Fir St.,16/05/2023-18:13
12346,James,Lee,james.lee@example.com,555-3456,789 Willow St.,17/05/2023-13:15
23457,Jessica,Chang,jessica.chang@example.com,555-7890,654 Elm St.,18/05/2023-15:51
34568,Andrew,Lin,andrew.lin@example.com,555-2345,321 Oak St.,19/05/2023-01:13
45679,Karen,Liu,karen.liu@example.com,555-6789,987 Maple St.,20/05/2023-06:51
56790,Rachel,Wu,rachel.wu@example.com,555-1234,654 Cedar St.,21/05/2023-06:21

Copie this [code] to create a CSV sample file, visitors.csv

Now let’s move on to the configuration of our job.

We’re going to name it batchConfig.java, the following code is the content of the configuration of our batch.

@Configuration
@EnableBatchProcessing(dataSourceRef = "batchDataSource", transactionManagerRef = "batchTransactionManager")
@RequiredArgsConstructor
public class BatchConfig extends DefaultBatchConfiguration {

    @Autowired
    private VisitorsRepository visitorsRepository;
    @Autowired
    private Visitors visitors;

    @Autowired
    private JobRepository jobRepository;
    @Autowired
    private ItemReader<Visitors> visitorsItemReader;


    @Autowired
    private PlatformTransactionManager transactionManager;

    @Bean
    public Visitors visitors() {
        return new Visitors();
    }


    @Bean
    public Job importVistorsJob() {
        return new JobBuilder("importVistorsJob", jobRepository)
                .start(importVistorsStep(jobRepository, visitors, transactionManager))
                .build();
    }

    @Bean
    public Step importVistorsStep(JobRepository jobRepository, Visitors visitors, PlatformTransactionManager transactionManager) {
        return new StepBuilder("importVistorsStep", jobRepository)
                .<Visitors, Visitors>chunk(100, transactionManager)
                .reader(visitorsItemReader)
                .processor(itemProcessor())
                .writer(writer())
                .build();
    }

    @Bean
    public ItemProcessor<Visitors, Visitors> itemProcessor() {
        return new VisitorsItemProcessor();
    }

    @Bean
    public ItemWriter<Visitors> writer() {
        return visitorsRepository::saveAll;
    }

    @Bean
    public FlatFileItemReader<Visitors> flatFileItemReader(@Value("${inputFile}") Resource inputFile){
        FlatFileItemReader<Visitors> flatFileItemReader = new FlatFileItemReader<>();
        flatFileItemReader.setName("DEVAL");
        flatFileItemReader.setLinesToSkip(1);
        flatFileItemReader.setResource(inputFile);
        flatFileItemReader.setLineMapper(linMappe());
        return flatFileItemReader;
    }
    @Bean
    public LineMapper<Visitors> linMappe() {
        DefaultLineMapper<Visitors> defaultLineMapper = new DefaultLineMapper<>();
        DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
        lineTokenizer.setDelimiter(",");
        lineTokenizer.setNames("id","firstName","lastName","emailAddress","phoneNumber","address","strVisitDate");
        lineTokenizer.setStrict(false); // Set strict property to false
        defaultLineMapper.setLineTokenizer(lineTokenizer);
        BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper();
        fieldSetMapper.setTargetType(Visitors.class);
        defaultLineMapper.setFieldSetMapper(fieldSetMapper);
        return defaultLineMapper;

    }

}

We also need to create the repository and ItemProcessor. Both are called VisitorsRepository and VisitorsItemProcessor respectively.

public interface VisitorsRepository extends JpaRepository<Visitors, Long> {
}

@Component
public class VisitorsItemProcessor implements ItemProcessor<Visitors, Visitors> {

    private final SimpleDateFormat dateFormat = new SimpleDateFormat("dd/MM/yyyy-HH:mm");

    @Override
    public Visitors process(Visitors item) throws Exception {
        item.setVisitDate(dateFormat.parse(item.getStrVisitDate()));
        return item;
    }
}

we can now create the Launcher that will allow us to launch our Job. To do this, let’s create the JobController controller, in which we will implement this treatment.

@RestController
public class JobControl {

    @Autowired
    private JobLauncher jobLauncher;

    @Autowired
    private Job job;

    @GetMapping("/runJob") 
    public BatchStatus load() throws JobInstanceAlreadyCompleteException, JobExecutionAlreadyRunningException, JobParametersInvalidException, JobRestartException {
        JobParameters jobParameters = new JobParametersBuilder()
                .addDate("timestamp", Calendar.getInstance().getTime())
                .toJobParameters();
        JobExecution jobExecution = jobLauncher.run(job, jobParameters);
        while (jobExecution.isRunning()){
            System.out.println("..................");
        }
        return jobExecution.getStatus();
    }


}

We can now launch our project. Let’s configure the application.properties file so that the Job doesn’t start as soon as the project is compiled, but rather when the controller is called.

We will set up also the configuration for our database H2, as well as the path to our CSV file.

inputFile=classpath:/visitors.csv
spring.datasource.url=jdbc:h2:mem:testdb
spring.main.allow-circular-references=true
spring.batch.job.enabled=false

Once this is done, we can launch the project.

let’s check that our test database is actually created. To do this, go to the address :

http://localhost:8080/h2-console

we should have the following output:

connect without changing anything, to see if our table has been well created.

If Table “BATCH_JOB_INSTANCE” is not found, that means it didn’t create automatically, you have to set it manually. The SQL schema can be found here :

spring-batch/schema-db2.sql at main · spring-projects/spring-batch · GitHub,

Do not modify that file, copy, paste, and RUN “If you don’t have the tables above the Visitors table” on this picture.

after running it, relaunch your page, If everything is ok, you will see those tables.

Now call your JobController at localhost:8080/runJob in another tab. You must have the message “COMPLETED”

Reload the H2 page and view your table by clicking on the Visitors table and then RUN SELECT * FROM VISITORS;

Here we’re done.

Have a good read.

Your feedback and comments will help me to improve my work, and I’ll appreciate every single one of them. Leave your questions in the comments, or your email address if you need the source code.

Spring Batch: retrieve data from the CSV file and save it to Database H2

retrieve data from the CSV file and save it to Database H2

Written by Devalère T KAMGUIA