Query Batching in Spring Boot

Jesper Håsteen
Predictly on Tech
Published in
4 min readMay 2, 2023

Optimizing database performance can be a daunting task for developers, with many different factors to consider. However, there are a few simple techniques that can make a big impact. One such technique is query batching, which involves reducing the number of round trips to the database by sending multiple queries in a single batch. In this article, we will explore how to implement query batching in Spring Boot to improve database performance.

Cars vs Bus traffic comparison — Wikipedia

As in our other articles, I will use the MySQL Employees database but this time I have deployed it in a cloud environment so that the response time is more realistic. The JPA model of the Employee entity looks like this, I included the Persistable interface which is required to batch with manually set Ids (read more about Persistable here)

@Entity
@Table(name = "employees")
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@EqualsAndHashCode(onlyExplicitlyIncluded = true)
public class Employee implements Persistable<Integer> {

@Id
@Column(name = "emp_no")
@EqualsAndHashCode.Include
private int employeeId;

@Column(name = "birth_date")
private LocalDate birthDate;

@Column(name = "first_name")
private String firstName;

@Column(name = "last_name")
private String lastName;

@Column(name = "hire_date")
private LocalDate hireDate;

@OneToMany(mappedBy = "employeeId", cascade = CascadeType.ALL)
@ToString.Exclude
private Set<Salary> salaries;

@OneToMany(mappedBy = "employeeId", cascade = CascadeType.ALL)
@ToString.Exclude
private Set<Title> titles;

@Transient
private boolean isNew = true;

@Override
public boolean isNew() {
return isNew;
}

@PrePersist
@PostLoad
void markNotNew() {
this.isNew = false;
}

@Override
public Integer getId() {
return employeeId;
}
}

And we need a Spring Data repository

@Repository
public interface EmployeeRepository extends JpaRepository<Employee, Integer> {

}

Now lets make a test that can measure the performance. Instead of using warmups we’re going to run a loop enough that any startup differences aren't going to make much of a difference.

    @Test
@Transactional
void entity_save_time_loop() {
long start = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
Employee entity = Employee.builder()
.employeeId(i)
.firstName("John the %d".formatted(i))
.lastName("Doe")
.birthDate(LocalDate.now().minusYears(25).minusDays(i))
.hireDate(LocalDate.now().minusYears(2).minusDays(i))
.gender(Gender.M)
.salaries(Set.of())
.titles(Set.of())
.build();
employeeRepository.save(entity);
}
// Flush to make sure everything was actually sent to the db
employeeRepository.flush();
long end = System.currentTimeMillis();
// DurationFormatUtils is a class from apache commons-lang3
String formattedDuration = DurationFormatUtils.formatDuration(end - start, "HH:mm:ss.SSS");
System.out.printf("Time elapsed: %s\n", formattedDuration);
}

Which gets this result

Time elapsed: 00:02:36.685

Session Metrics {
14295417 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
1101264220 nanoseconds spent preparing 10000 JDBC statements;
153687733623 nanoseconds spent executing 10000 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
156406479958 nanoseconds spent executing 1 flushes (flushing a total of 10000 entities and 20000 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

And that is really slow, partly inflated because of the distance to the cloud database but could reflect some real scenarios.

Batching configuration

The main reason why the queries were so slow is that it makes a round trip to the database for every employee we want to save and each of those takes 15ms. Lets try enabling batching with this configuration

spring:
jpa:
properties:
hibernate:
order_inserts: true
order_updates: true
jdbc:
batch_size: 100
batch_versioned_data: true

Here we have order_inserts and order_updatesthat makes Hibernate rearrange our queries based on their dependencies. Rearranging is needed when batching inserts of for example Employees that also contain Titles and/or Salaries or Hibernate will fallback to single row inserts.

Be aware that if your entity dependencies are very complex this might cause some overhead while Hibernate figures the order out.

Next we choose our batch size, in this case I chose 100 but generally more will be faster at the cost of memory requirements to store the changes and also increased size of the query call which can become too large for your network or server settings. You will have to test your limits and needs because it can be very different from case to case.

Then we have batch_versioned_data on the last row that enables batching of versioned data, this is only needed if you use optimistic locking but can have massive performance improvements because version numbers will only need one check and update per batch.

Batching with MySQL

By default the MySQL driver does not support batching because of how it handles queries. To enable batching we have to tell it to rewrite queries by adding ?rewriteBatchedStatements=true to your JDBC URL.

Results

Now how fast did my query run with these settings?

Time elapsed: 00:00:03.717

Session Metrics {
14306791 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
7100542 nanoseconds spent preparing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC statements;
3226560211 nanoseconds spent executing 100 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
3428044417 nanoseconds spent executing 1 flushes (flushing a total of 10000 entities and 20000 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

Which is 435% faster! Granted this is a somewhat unfair comparison because of the slow connection to my cloud database but it still shows that the potential gains are absolutely massive.

--

--