4 min readAug 31, 2023

Spring Data JPA : how to improve bulk insert performance

In this article lets go through the steps involved to bulk insert data into database leveraging Spring Data JPA.

We are going to use JpaRepository interface provided by Spring Framework. JpaRespository further extends ListCrudRepository which has methods to do operations on Collection of entities. Particularly, we are going to leverage saveAll() to do bulk insert operation, whose method signature is provided below:

Initial setup:

This is the initial state of my project where I have used GenerationType strategy as IDENTITY for the entity primary keys.

public class PostEntity {
 
 @Id
 @GeneratedValue(strategy = GenerationType.IDENTITY)
 private Long id;

 @NotBlank(message = "Post name is required")
 @Size(max = 10, message = "Post name must be atmost 10 characters")
 @Column(name = "name")
 private String name;
 
 @NotBlank(message = "Post content is required")
 @Size(max = 50, message = "Post content must be atmost 50 characters")
 @Column(name = "content")
 private String content;

 @Column(name = "createdDateTime")
 private LocalDateTime createdDate;

 // All the operations should be cascaded from Parent entity to Child entity
 @OneToMany(mappedBy = "post", cascade = CascadeType.ALL, fetch=FetchType.LAZY, orphanRemoval = true)
 @Singular
 private List<CommentEntity> comments;
}

public class CommentEntity {

 @Id
 @GeneratedValue(strategy = GenerationType.IDENTITY)
 private Long id;

 @NotBlank(message = "Review is required")
 @Size(max = 50, message = "Review must be atmost 50 characters")
 @Column(name = "review")
 private String review;
 
 @Enumerated(EnumType.STRING)
 @Column(name = "status")
 private Status status;
 
 @Column(name = "createdDateTime")
 private LocalDateTime createdDateTime;
 
 @ManyToOne
 //@JoinColumn(name = "post_det_id")
 private PostEntity post;
}

And, leveraged saveAll() method from the Spring’s JPA repository.

public class DBRunnerJob implements CommandLineRunner {
 
 private static int TOTALCOUNT = 10000;
 @Autowired
 private PostRepository postRepository;

 @Override
 public void run(String... args) throws Exception {
 
  List<PostEntity> postEntityList = new ArrayList<>();
  
  for (int index = 0; index < TOTALCOUNT; index++) {
   PostEntity postEntity = PostEntity.builder().name("Post " + index).content("Sample post " + index + " in Twitter").createdDate(LocalDateTime.now()).build();
   postEntity.setComments(Arrays.asList(CommentEntity.builder().review("Sample review " + index)
     .status(Status.APPROVED).createdDateTime(LocalDateTime.now()).post(postEntity).build()));
   postEntityList.add(postEntity);
  }
  postRepository.saveAll(postEntityList);
 }
}

We can enable the statistics to see the metrics to analyze time taken to complete the entire bulk insertion. This can be achieved through the configuration as shown below:

spring.jpa.properties.hibernate.generate_statistics=true

From the metrics, we can see it took 40 seconds for saving 20k records in the H2 database. Also, it is evident that records were not inserted inside a batch.

 2023-08-31T15:51:26.044-04:00[0;39m [32m INFO[0;39m [35m2760[0;39m [2m---[0;39m [2m[           main][0;39m [36mi.StatisticalLoggingSessionEventListener[0;39m [2m:[0;39m Session Metrics {
    29300 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    3010346380 nanoseconds spent preparing 20000 JDBC statements;
    40301450700 nanoseconds spent executing 20000 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    3450670790 nanoseconds spent executing 1 flushes (flushing a total of 20000 entities and 10000 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

The reason behind is that batching isn’t switched on by default in some cases. So to improve the performance of the insert, we had to change the way we were inserting the records. It took a couple of iterations it took to improve the performance of the insert operation.

Step 1: Switch on bulk insert

spring.jpa.properties.hibernate.jdbc.batch_size=1000
spring.jpa.properties.hibernate.order_inserts=true

The property “batch_size” tells Hibernate to collect inserts in batches of size 1000.
The “order_inserts” property tells Hibernate to group inserts by entity, through batching.

This reduced the time by a little, dropping from 40 secs to 30 Secs, not so much improvement, but there is another step further.

Step 2: Change the ID Generation type strategy

Earlier we have used the GenerationType strategy as IDENTITY. This doesn’t support batch insert since Hibernate has to rely on the database to insert the id value. Please refer here for more details. Now, updated the strategy as SEQUENCE along with a Sequence generator.

@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seqGen")
@SequenceGenerator(name = "seqGen", sequenceName = "post_id_seq", allocationSize = 1)
 private Long id;

Now the performance has improved a lot. Please see below table containing the stats:

Conclusion

In this blog, we explore a couple of ways to improve the performance of bulk insert using Spring JPA. This is the GitHub link of the project. Please share if you have any feedback, thank you!

Spring Data JPA : how to improve bulk insert performance

Conclusion

Written by Sakthi