Hibernate sessions and effective batch update

Sourin Sutradhar
Programming Notes
Published in
6 min readDec 12, 2016

--

I am discussing some key concepts and their applications that needs to be understood to implements effective batch update in Hibernate. This is not a tutorial and I am presenting the aspects that I keep in mind while implementing batch update using hibernate.

Beginning with Sessions

As always, a proper session management is the first step for any DB operation in Hibernate. I often tend to use a SessionManager that takes care of my connections and pooling.

For connection pooling, you can explore good tools like C3P0, HikariCP or BoneCP. According to most people HikariCP is the fastest followed by BoneCP followed by C3P0. Here I demonstrate some setting I tend to keep in hibernate.cfg.xml

<!-- C3PO configurations --><property name="hibernate.c3p0.min_size">5</property><property name="hibernate.c3p0.max_size">20</property><property name="hibernate.c3p0.timeout">300</property><property name="hibernate.c3p0.max_statements">50</property><property name="hibernate.c3p0.idle_test_period">3000</property>

Now to implement the SessionManager,

public class SessionManager {private static final SessionFactory sessionFactory = buildSessionFactory();private static SessionFactory buildSessionFactory() {    try {       // Create the SessionFactory from hibernate.cfg.xml       return new Configuration().configure().buildSessionFactory();    } catch (Throwable ex) {    // Make sure you log the exception, as it might be swallowed    System.err.println("Initial SessionFactory creation failed." + ex);    throw new ExceptionInInitializerError(ex);    }}
public static SessionFactory getSessionFactory() {
return sessionFactory;}
public static void shutdown() {
// Close caches and connection pools getSessionFactory().close();}} // Closing class

Transaction practises

The session objects should not be kept open for a long time because they are not usually thread safe and they should be created and destroyed them as needed. The main function of the Session is to offer create, read and delete operations for instances of mapped entity classes. Instances may exist in one of the following three states at a given point in time:

  • transient: A new instance of a a persistent class which is not associated with a Session and has no representation in the database and no identifier value is considered transient by Hibernate.
  • persistent: You can make a transient instance persistent by associating it with a Session. A persistent instance has a representation in the database, an identifier value and is associated with a Session.
  • detached: Once we close the Hibernate Session, the persistent instance will become a detached instance.

A Session instance is serializable if its persistent classes are serializable. A typical transaction should use the following idiom:

Session session = factory.openSession();Transaction tx = null;try {    tx = session.beginTransaction();    // do some work    ...    tx.commit();}catch (Exception e) {    if (tx!=null) tx.rollback();    e.printStackTrace();}finally {    session.close();}

If the Session throws an exception, the transaction must be rolled back and the session must be discarded.

Now there are some keys methods that must be understood in this context :

session.save() : Hibernate basically remembers somewhere inside session that this object has to be saved. It can decide if he wants to issue INSERT INTO... immediately, some time later or on commit. This is a performance improvement, allowing Hibernate to batch inserts or avoid them if transaction is rolled back.

session.flush() : Hibernate is forced to issue INSERT INTO... against the database. The entity is stored in the database, but not yet committed. Depending on transaction isolation level it won’t be seen by other running transactions. But now the database knows about the record. What session.flush() does is to empty the internal SQL instructions cache, and execute it immediately to the database.

When you call transaction.rollback(), Hibernate rolls-back the database transaction. Database handles rollback, thus removing newly created object.

Flushing the Session simply makes the data that is currently in the session synchronized with what is in the database.
However, just because you have flushed, doesn’t mean the data can’t be rolled back.

Commit does flush the session, but it also ends the unit of work.

Batch updates

Consider a requirement when you want to insert a large number of records in database using Hibernate. The code looks like as below.

Session session = SessionManager.getSessionFactory().openSession();Transaction tx = session.beginTransaction();for ( int i=0; i<100000; i++ ) {    Employee emp = new Employee(.....);    session.save(emp);}tx.commit();session.close();

This code may throw OutOfMemoryError somewhere around 50,000th row. Because Hibernate caches all the newly inserted Employee objects in the session level cache. We can solve this problem using hibernate batch processing.

We need to set hibernate.jdbc.batch_size in hibernate.cfg.xml as below :

<property name="hibernate.jdbc.batch_size">40</property>

So, Hibernate executes every 40 rows as a batch.

And the above code has to be changed to :

Session session = SessionManager.getSessionFactory().openSession();Transaction tx = session.beginTransaction();for ( int i=0; i<100000; i++ ) {    Employee emp = new Employee(.....);    session.save(emp);    if( i % 40 == 0 ) { // Same as the JDBC batch size        //flush a batch of inserts and release memory:        session.flush();        session.clear();    }}tx.commit();session.close();

Batch processing helps to avoid OutOfMemoryError.

flush() will synchronize your database with the current state of object/objects held in the memory but it does not commit the transaction. So, if you get any exception after flush() is called, then the transaction will be rolled back. You can synchronize your database with small chunks of data using flush() instead of committing a large data at once using commit() and face the risk of getting an Out Of Memory Exception. Flushing the session runs a “dirty check” on all objects attached to it. This means that Hibernate needs to check all attached objects, determine if any property of the object has changed, and remember it in a map for later generating of necessary SQL to persist those changes. A dirty check is usually an expensive operation as Hibernate also checks the state of every dependent object. Once all objects are checked for changes, Hibernate executes necessary SQL for the updates while updating the query cache.

commit() will make data stored in the database permanent. There is no way you can rollback your transaction once the commit() succeeds. After commit, to be sure we released all the resources, we need to additionally clear the session. The consequence of this, is that any changes that were not flushed before calling clear, will be lost.

For huge size batch uploads, just clearing the session might not be enough as our session will still hold lists of post-transaction executions which will be cleared only after committing the transaction.

Object State

One important thing I will like to discuss in regard to this is how the state of the object in session and database varies during these operations:

For example, consider an entry in database,

SQL> SELECT * FROM EMPLOYEE;EMPNO      EMPNAME        EMPDEPT---------- -------------- --------------------1          Sam            Finance      

Now in application code,

Session session = SessionManager.getSessionFactory().openSession();session.getTransaction().begin();Employee employee = (Employee)session.get(Employee.class, 1);System.out.println("Before updating database");System.out.println("EMP NAME : " + employee.getEmpName());System.out.println("EMP DEPT : " + employee.getEmpDept());/** Insert break point, go to the database and modifySam's name to "New Sam"*/System.out.println("After updating database");System.out.println("EMP NAME :" + employee.getEmpName());System.out.println("EMP DEPT :" + employee.getEmpDept());session.getTransaction().commit();session.close();// Output
Before updating database
EMP NAME : SamEMP DEPT : FinanceAfter updating databaseEMP NAME : SamEMP DEPT : Finance

Here, session.get(Employee.class, 1) hits the database and fetches the data from database and assign the data to employee entity object and this entity object is cached on session object. When we update the data on the database it will not get the updated data. It just shows the session cached data. Thats the reason the updated values are not printed.

Consider another approach,

Session session1 = SessionManager.getSessionFactory().openSession();Session session2 = SessionManager.getSessionFactory().openSession();session1.getTransaction().begin();
session2.getTransaction().begin();
Employee employee = (Employee)session1.get(Employee.class, 1);System.out.println("Before updating database");System.out.println("EMP NAME : " + employee.getEmpName());System.out.println("EMP DEPT : " + employee.getEmpDept());/** Insert break point, go to the database and modifySam's name to "New Sam"*/employee = (Employee)session2.get(Employee.class, 1);System.out.println("After updating database");System.out.println("EMP NAME :" + employee.getEmpName());System.out.println("EMP DEPT :" + employee.getEmpDept());session1.getTransaction().commit();session2.getTransaction().commit();session1.close();session2.close();// Output
Before updating database
EMP NAME : SamEMP DEPT : FinanceAfter updating databaseEMP NAME : New SamEMP DEPT : Finance

Here session2 dont have any associated objects. So get() method hits the database,fetches the records from database and displayed the updated records. Problem is here we need to create new session object to get the updated record. Instead we can use refresh() method to get the updated records from the database.

Consider this approach using refresh() method,

Session session = SessionManager.getSessionFactory().openSession();session.getTransaction().begin();Employee employee = (Employee)session.get(Employee.class, 1);System.out.println("Before updating database");System.out.println("EMP NAME : " + employee.getEmpName());System.out.println("EMP DEPT : " + employee.getEmpDept());/** Insert break point, go to the database and modifySam's name to "New Sam"*/session.refresh(employee);System.out.println("After updating database");System.out.println("EMP NAME :" + employee.getEmpName());System.out.println("EMP DEPT :" + employee.getEmpDept());session.getTransaction().commit();session.close();// Output
Before updating database
EMP NAME : SamEMP DEPT : FinanceAfter updating databaseEMP NAME : New SamEMP DEPT : Finance

Here after calling session.refresh(employee), Hibernate compares database data and employee object data. If it finds any difference it will again execute select query and update the employee object data.

A method that we often discuss in relation to this is the clear(), which basically completely clears the session and is used to dissociate/disconnect all the objects from the session.

Now we understand, that .save() saves the state in the session, .flush() forces the SQL insert and .refresh() re-reads the state (after the trigger executes).

Hope that helps. Thanks.

--

--