Bulk Insert; Auto Increment; Hibernate; YES it’s possible

Pasindu Senanayake
ParallaxTec
Published in
3 min readJul 28, 2021

When it comes to object relation mapping hibernate is one of the pioneers. According to the creators of hibernate, it is designed to generate the most optimal query that can be derived with static information. Well, I would say not always!!

When you have a table with an auto-increment primary id like below it is impossible to perform a bulk insert with hibernate and it is frustrating.

CREATE TABLE `student` (
`student_id` int(11) NOT NULL AUTO_INCREMENT,
`student_name` varchar(45) NOT NULL,
`student_age` varchar(45) NOT NULL,
PRIMARY KEY (`student_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

The problem here is we have to use @GeneratedValue(strategy = GenerationType.IDENTITY) in the model and when it is used, here baeldung says and I quote

When we want to use batching for inserts/updates, we should be aware of the primary key generation strategy. If our entities use GenerationType.IDENTITY identifier generator, Hibernate will silently disable batch inserts/updates.

Let’s dig deep into this. First, let’s see what is actually going on. The student entity has 3 attributes and they are mapped as follows.

And the Dao layer is as follows.

The last piece is as follows

Once this is executed hibernate generates the following queries for us.

{"name":"Debug-Logger", "time":1, "success":true, "type":"Prepared", "batch":false, "querySize":1, "batchSize":0, "query":["insert into student (student_age, student_name) values (?, ?)"], "params":[["20","testA"]]}
{"name":"Debug-Logger", "time":1, "success":true, "type":"Prepared", "batch":false, "querySize":1, "batchSize":0, "query":["insert into student (student_age, student_name) values (?, ?)"], "params":[["25","testB"]]}

All good except that it generates two queries. And if you look carefully you would see batch: false. But the weird fact is when it comes to the database (at least MySQL) a query like the below is perfectly fine.

insert into student (student_id,student_name,student_age) values (null, 'TestA', 20), (null, 'TestB', 25)

So how can we achieve that? Let’s see.
The first option is if this @GeneratedValue(strategy = GenerationType.IDENTITY) is the one who makes our lives miserable let’s remove it and try. So the execution without the above annotation wasn’t smooth.

java.lang.I1LlegalStateException: Failed to execute ApplicationRunner
Caused by: org.springframework.orm.jpa.JpaSystemException: ids for this class must be manually assigned before calling save(): com.example.demo.model.Student; nested exception is org.hibernate.id.IdentifierGenerationException:
ids for this class must be manually assigned before calling save(): com.example.demo.model.Student

Oh, Poor hibernate! It says that the entities can’t be there as unassigned. The reason is under the hood hibernate leverages the uniqueness of the primary id to uniquely identifies the entities in a collection. So here comes the hack-1. Let’s show some id and quickly remove it !!

Here, we try to trick hibernate. First, we give a unique primary id using an atomic integer. And we unwrapped the entityManager.persist method as above so that we can clear the primary id after session.persist. The thought process behind that is in order to uniquely identify entities, an entity has a unique id and it is used when the entities are persisted to the session. But, once persisted the id is removed before the data is flushed and committed so that student_ids of the generated query would be always null. Since we didn’t use the Identity annotation hibernate should be able to generate the single batch insert query. But here is the results.

java.lang.IllegalStateException: Failed to execute ApplicationRunner
Caused by: org.springframework.orm.jpa.JpaSystemException: identifier of an instance of com.example.demo.model.Student was altered from 8 to null; nested exception is org.hibernate.HibernateException: identifier of an instance
of com.example.demo.model.Student was altered from 8 to null

Hibernate is very politely asking ‘how dare you try to trick me’ !! . So hibernate tracks the changes but the question is how far is it willing to go on that road?

In that case, we decided to try an alternative approach which made it impossible for hibernate to compute the difference. We wrapped the integer with a wrapper class and wrote the converters to that custom wrapper. (This is the standard way of introducing a custom data type to hibernate)

And we updated the model with CustomInteger entity.

After all these modifications the execution results were as follows.

{“name”:”Debug-Logger”, “time”:2, “success”:true, “type”:”Prepared”, “batch”:true, “querySize”:1, “batchSize”:2, “query”:[“insert into student (student_age, student_name, student_id) values (?, ?, ?)”], “params”:[[“20”,”testA”,null],[“25”,”testB”,null]]}

Finally, our wrapper class did the trick and made both hibernate and database happy :).

It’s important to mention that the above implementation is not discussed or presented in any of the hibernate forums or communities. So unless you are confident about the implementation it is advised to stick with individual inserts. With this, we have seen a massive performance improvement with the same accuracy when it comes to larger datasets, but if you aren’t confident enough, remember that most of the time accuracy matters over performance.

Bulk Insert: https://github.com/PasinduSenanayake/bulk-insert-demo/tree/bulk-insert
Individual Insert: https://github.com/PasinduSenanayake/bulk-insert-demo/tree/individual-insert.

--

--

Pasindu Senanayake
ParallaxTec

Graduate of Department of Computer Science and Engineering at University of Moratuwa