Predictly on Tech
Published in

Predictly on Tech

Insert without Select using JPA

When working with larger datasets, JPA has a noticeable drawback in that it will track and synchronize every entity individually of each other. This means that it will issue one SQL statement for each entity created or modified. While very easy to work with as a developer, it adds an overhead that is very significant when ingesting entities in large batches.

Photo by Nana Smirnova on Unsplash

As in our other articles, I will use the MySQL Employees database via the docker container genschsa/mysql-employees. The JPA model of the Employee entity is shown below

Batch Ingest

Suppose we have a service that each month ingests a file of employee records to be loaded into a database. We know that the entries in the file are not yet available in our database.

For this example we also want to enable statistics and logging to see what’s happening behind the scenes.

Executing the code above will produce the following output

As can be seen, for every Employee we get two additional statements to load the corresponding entity and child records from the database. The result is that when ingesting 500 employees and their titles, we get 1,000 SELECT statements issued towards the database.

Spring Data Persistable

Fortunately, Spring Data JPA provides a solution to this issue called Persistable. It enables us to instruct JPA that the entity that we are about to persist is new so that JPA does not need to do this redundant check for us. To use it, we add a base class that takes care of the boilerplate for us.

By implementing the isNew method we can ensure that Hibernate does not perform an extra query to check whether the entity is new or not. We use JPA annotations @PostLoad and @PostPersist to set the isNew field to false when the entity is loaded or persisted to automatically handle its state.

This produces the following result

As can be seen, we’ve managed to reduce the number of queries issued to ingest the 500 records in half which will save both time and resources on our database server. More noteably, we cut the JDBC statement execution time from 14 seconds to 6.5, more than a 50% reduction in execution time.

These numbers are when running the application and MySQL instance on two different machines connected via WiFi. This does not represent a product environment but neither does running the application with the MySQL instance available on localhost, since then the network delay is removed completely.

There is still more work that can be done to improve on these results, more on that in a later article.

--

--

Discoveries made while delivering technical solutions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store