Fast Read and Writes of complex JPA objects

Published in

Predictly on Tech

7 min readApr 3, 2024

Working with JPA using spring-data-jpa is a great way to improve your productivity as a developer. No matter the complexity of your object, as long as it is properly modeled it will be persisted and fetched without you having to think about it.

The downside is that reading and writing these objects can become very slow. We talked about how to perform fast writes using JPA in a previous article where we discussed batch inserts, but we didn’t pay much attention to the requirements of our JPA implementation.

A turtle crawling over the beach. It will eventually reach it’s goal but it will take time. — It will get there, eventually — Photo by Nick Abrams on Unsplash

Complex Objects create Complex Problems

In this article we will work with a complex CustomerOrder object that has a child collection which in turn has its own child collection.

@Entity
@Table(name = "customer_order")
public class CustomerOrderEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String customerReference;
    private String customerName;
    private String customerEmail;
    private LocalDateTime orderTimestamp;

    @OneToMany(mappedBy = "customerOrder", cascade = CascadeType.ALL, fetch = FetchType.EAGER, orphanRemoval = true)
    private Set<CustomerOrderLineEntity> lines = new HashSet<>();
}

@Entity
@Table(name = "customer_order_line")
public class CustomerOrderLineEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @ManyToOne(cascade = CascadeType.DETACH)
    @JoinColumn(name = "customer_order_id")
    private CustomerOrderEntity customerOrder;
    private Integer lineNumber;
    private String articleId;
    private BigDecimal quantity;
    private BigDecimal unitPrice;
    private BigDecimal deliveredQuantity = BigDecimal.ZERO;

    @OneToMany(mappedBy = "customerOrderLine", cascade = CascadeType.ALL, fetch = FetchType.LAZY, orphanRemoval = true)
    private Set<COLineAttributeEntity> attributes = new HashSet<>();

    @Entity
    @Table(name = "customer_order_line_attribute")
    @IdClass(COLineAttributeEntity.COLineAttributeEntityId.class)
    public static class COLineAttributeEntity implements Persistable<COLineAttributeEntity.COLineAttributeEntityId> {

        @Id
        @ManyToOne(cascade = CascadeType.DETACH)
        @JoinColumn(name = "customer_order_line_id")
        private CustomerOrderLineEntity customerOrderLine;
        @Id
        @Column(name = "attr_key")
        private String key;
        @Column(name = "attr_value")
        private String value;

        private static class COLineAttributeEntityId implements Serializable {

            private static final long serialVersionUID = -1875048762324910075L;

            private Long customerOrderLine;
            private String key;
        }
    }
}

A CustomerOrder has a child collection of CustomerOrderLine and each line can have any number of COLineAttribute key-value pairs. Both the CustomerOrder object and CustomerOrderLine use Identity as the strategy for generating unique id:s.

Assume that we have an order with 50 lines and that each line has three attributes, a total of 1 + 50 + 50 * 3 = 201 entries to be written to the database. With batch inserts configured, we get the following results

2024-04-03T15:25:45.111+02:00  INFO 3178207 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics {
    234768043 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    38397976 nanoseconds spent preparing 101 JDBC statements;
    256144869 nanoseconds spent executing 51 JDBC statements;
    559908072 nanoseconds spent executing 50 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    35958936 nanoseconds spent executing 2 flushes (flushing a total of 402 entities and 104 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

The reason we are not seeing batch inserts is that we are using Identity as the strategy for generating unique id:s for each CustomerOrderLine. This forces Hibernate to send each row as a unique statement to the database which disables batching.

Sequence as Generation Strategy

To overcome this issue we can switch to using a sequence as strategy since this allows us to calculate the next value in the sequence in memory of our application instead of by the database.

@Entity
@Table(name = "customer_order_line")
public class CustomerOrderLineEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "customer_order_line_id_seq")
    @SequenceGenerator(name = "customer_order_line_id_seq", sequenceName = "col_id_seq", allocationSize = 40)
    private Long id;

}

With this configuration, we get the following output

Hibernate: 
    insert 
    into
        customer_order
        (customer_email, customer_name, customer_reference, order_timestamp) 
    values
        (?, ?, ?, ?)
Hibernate: 
    select
        next_val as id_val 
    from
        col_id_seq for update
Hibernate: 
    update
        col_id_seq 
    set
        next_val= ? 
    where
        next_val=?
Hibernate: 
    select
        next_val as id_val 
    from
        col_id_seq for update
...
Hibernate: 
    insert 
    into
        customer_order_line
        (article_id, customer_order_id, delivered_quantity, line_number, quantity, unit_price, id) 
    values
        (?, ?, ?, ?, ?, ?, ?)
...
2024-04-03T15:32:42.210+02:00  INFO 3200215 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics {
    259294665 nanoseconds spent acquiring 3 JDBC connections;
    205008 nanoseconds spent releasing 2 JDBC connections;
    4817597 nanoseconds spent preparing 7 JDBC statements;
    22735084 nanoseconds spent executing 5 JDBC statements;
    32578453 nanoseconds spent executing 2 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    68606595 nanoseconds spent executing 2 flushes (flushing a total of 402 entities and 104 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

First, since this is based on MySQL which does not have native support for sequences, it will query for next_val from the table col_id_seq that represents a sequence. When it does this, it also locks the row for any other instances until it updates the value. In this example I set allocationSize to 40 which means that for 50 lines, it needs to perform two allocations.

If a new order would then be places that has less than 30 lines (2 * 40 — 50 = 30), it would be able to insert these without allocating new values.

Secondly, it can utilize JDBC batching which is much more performant when it comes to inserting large sets of data. Inserting the same amount of data drops from 1.2 to 0.38 seconds.

UUID as Generation Strategy

Another alternative to using a Sequence would be to use random UUID:s since these can be generated in memory of your application as well and JPA supports it out of the box.

@Entity
@Table(name = "customer_order_line")
public class CustomerOrderLineEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.UUID)
    private String id;
}

Since these are generated randomly, they will create problems for you database when inserting. As the database grows larger, an insert of 50 lines would have its id values spread evenly over the complete index value space. This forces your database to read, write and potentially rearrange several different pages of the index space. You will very rapidly start to see a lot of I/O on your database when this happens and performance will be severely limited.

By utilizing a sequence which is ordered, we are only working on the most recent pages of our index space which means that our database does not have to swap pages in and out all the time.

Reading a CustomerOrder

If we now want to read a CustomerOrder object from the database, we get the following result

Hibernate: 
    select
        coe1_0.id,
        coe1_0.customer_email,
        coe1_0.customer_name,
        coe1_0.customer_reference,
        l1_0.customer_order_id,
        l1_0.id,
        l1_0.article_id,
        l1_0.delivered_quantity,
        l1_0.line_number,
        l1_0.quantity,
        l1_0.unit_price,
        coe1_0.order_timestamp 
    from
        customer_order coe1_0 
    left join
        customer_order_line l1_0 
            on coe1_0.id=l1_0.customer_order_id 
    where
        coe1_0.id=?
Hibernate: 
    select
        a1_0.customer_order_line_id,
        a1_0.attr_key,
        a1_0.attr_value 
    from
        customer_order_line_attribute a1_0 
    where
        a1_0.customer_order_line_id=?
...
2024-04-03T15:55:00.969+02:00  INFO 3270276 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics {
    289787297 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    11803018 nanoseconds spent preparing 51 JDBC statements;
    325803108 nanoseconds spent executing 51 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

It does a good job of fetching all lines together with the initial query, but then it starts to load the attributes of each line individually which forces a lot of round trips to our database. There are two different ways of resolving this.

First, we can write our own custom JPQL query which performs LEFT JOIN FETCH as in a previous article I wrote.

Alternatively, we can instruct Hibernate to load the children in batches instead of using single SELECT statements.

@Entity
@Table(name = "customer_order_line")
public class CustomerOrderLineEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "customer_order_line_id_seq")
    @SequenceGenerator(name = "customer_order_line_id_seq", sequenceName = "col_id_seq", allocationSize = 40)
    private Long id;

    ...

    @BatchSize(size = 50)
    @OneToMany(mappedBy = "customerOrderLine", cascade = CascadeType.ALL, fetch = FetchType.LAZY, orphanRemoval = true)
    private Set<COLineAttributeEntity> attributes = new HashSet<>();
}

With this configuration, we instead get

Hibernate: 
    select
        coe1_0.id,
        coe1_0.customer_email,
        coe1_0.customer_name,
        coe1_0.customer_reference,
        l1_0.customer_order_id,
        l1_0.id,
        l1_0.article_id,
        l1_0.delivered_quantity,
        l1_0.line_number,
        l1_0.quantity,
        l1_0.unit_price,
        coe1_0.order_timestamp 
    from
        customer_order coe1_0 
    left join
        customer_order_line l1_0 
            on coe1_0.id=l1_0.customer_order_id 
    where
        coe1_0.id=?
Hibernate: 
    select
        a1_0.customer_order_line_id,
        a1_0.attr_key,
        a1_0.attr_value 
    from
        customer_order_line_attribute a1_0 
    where
        a1_0.customer_order_line_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
 ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
 ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2024-04-03T16:06:50.831+02:00  INFO 3307516 --- [employee-service-v1] [nio-8080-exec-1] i.StatisticalLoggingSessionEventListener : Session Metrics {
    239367578 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    4376500 nanoseconds spent preparing 2 JDBC statements;
    194662592 nanoseconds spent executing 2 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

Instead of 51 queries sent to the database we now only get two which saves both time and resources in your system.

Fast Read and Writes of complex JPA objects

Complex Objects create Complex Problems

Sequence as Generation Strategy

UUID as Generation Strategy

Reading a CustomerOrder

Written by Patrik Hörlin