Insert without Select using JPA

Published in

Predictly on Tech

5 min readApr 3, 2023

When working with larger datasets, JPA has a noticeable drawback in that it will track and synchronize every entity individually of each other. This means that it will issue one SQL statement for each entity created or modified. While very easy to work with as a developer, it adds an overhead that is very significant when ingesting entities in large batches.

As in our other articles, I will use the MySQL Employees database via the docker container genschsa/mysql-employees. The JPA model of the Employee entity is shown below

@Entity
@Table(name = "employees")
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class Employee {
    
    @Id
    @Column(name = "emp_no")
    private int employeeId;

    @Column(name = "birth_date")
    private LocalDate birthDate;
    
    @Column(name = "first_name")
    private String firstName;
    
    @Column(name = "last_name")
    private String lastName;
    
    @Column(name = "hire_date")
    private LocalDate hireDate;

    @OneToMany(mappedBy = "employeeId", cascade = CascadeType.ALL, orphanRemoval = true)
    private Set<Salary> salaries;

    @OneToMany(mappedBy = "employeeId", cascade = CascadeType.ALL, orphanRemoval = true)
    private Set<Title> titles;
}

Batch Ingest

Suppose we have a service that each month ingests a file of employee records to be loaded into a database. We know that the entries in the file are not yet available in our database.

@Slf4j
@Service
@RequiredArgsConstructor
class EmployeeImportService {

    private final EmployeeRepository employeeRepo;

    @Transactional
    void persistEmployeeImportRecords(List<EmployeeIngest> employees, Map<Integer, List<TitleIngest>> titles) {
        Assert.notNull(employees, "employees must be provided");
        Assert.notNull(titles, "titles must be provided");
        var persistenceEmployees = employees.stream()
            .map(e -> toEmployee(e, titles.get(e.employeeId())))
            .toList();
        employeeRepo.saveAllAndFlush(persistenceEmployees);
    }
}

For this example we also want to enable statistics and logging to see what’s happening behind the scenes.

spring:
  # Datasource configuration here
  jpa:
    open-in-view: false
    show-sql: true
    hibernate:
      ddl-auto: validate
    properties:
      hibernate:
        format_sql: false
        generate_statistics: true

Executing the code above will produce the following output

Hibernate: 
    select
        employee0_.emp_no as emp_no1_2_1_,
        employee0_.birth_date as birth_da2_2_1_,
        employee0_.first_name as first_na3_2_1_,
        employee0_.hire_date as hire_dat5_2_1_,
        employee0_.last_name as last_nam6_2_1_,
        employeehi1_.emp_no as emp_no2_1_3_,
        employeehi1_.dept_no as dept_no1_1_3_,
        employeehi1_.dept_no as dept_no1_1_0_,
        employeehi1_.emp_no as emp_no2_1_0_,
        employeehi1_.from_date as from_dat3_1_0_,
        employeehi1_.to_date as to_date4_1_0_ 
    from
        employees employee0_ 
    left outer join
        dept_emp employeehi1_ 
            on employee0_.emp_no=employeehi1_.emp_no 
    where
        employee0_.emp_no=?
Hibernate: 
    select
        title0_.emp_no as emp_no1_4_0_,
        title0_.from_date as from_dat2_4_0_,
        title0_.title as title3_4_0_,
        title0_.to_date as to_date4_4_0_ 
    from
        titles title0_ 
    where
        title0_.emp_no=? 
        and title0_.from_date=? 
        and title0_.title=?
Hibernate: 
    insert 
    into
        employees
        (birth_date, first_name, hire_date, last_name, emp_no) 
    values
        (?, ?, ?, ?, ?)
Hibernate: 
    insert 
    into
        titles
        (to_date, emp_no, from_date, title) 
    values
        (?, ?, ?, ?)
StatisticalLoggingSessionEventListener : Session Metrics {
    32028393 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    432359630 nanoseconds spent preparing 2466 JDBC statements;
    14030242602 nanoseconds spent executing 2466 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    5701517420 nanoseconds spent executing 2 flushes (flushing a total of 2466 entities and 1000 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

As can be seen, for every Employee we get two additional statements to load the corresponding entity and child records from the database. The result is that when ingesting 500 employees and their titles, we get 1,000 SELECT statements issued towards the database.

Spring Data Persistable

Fortunately, Spring Data JPA provides a solution to this issue called Persistable. It enables us to instruct JPA that the entity that we are about to persist is new so that JPA does not need to do this redundant check for us. To use it, we add a base class that takes care of the boilerplate for us.

@MappedSuperclass
public abstract class AbstractPersistableEntity<T extends Serializable> 
        implements Persistable<T> {

    @Transient
    private boolean isNew = true;

    @Override
    public boolean isNew() {
        return isNew;
    }

    public void setNew(boolean isNew) {
        this.isNew = isNew;
    }

    @PostLoad
    @PostPersist
    void markNotNew() {
        this.isNew = false;
    }
}

By implementing the isNew method we can ensure that Hibernate does not perform an extra query to check whether the entity is new or not. We use JPA annotations @PostLoad and @PostPersist to set the isNew field to false when the entity is loaded or persisted to automatically handle its state.

@Entity
@Table(name = "employees")
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@EqualsAndHashCode(callSuper = false)
public class Employee extends AbstractPersistableEntity<Integer> {

    @Id
    @Column(name = "emp_no")
    private int employeeId;

    @Column(name = "birth_date")
    private LocalDate birthDate;

    @Column(name = "first_name")
    private String firstName;

    @Column(name = "last_name")
    private String lastName;

    @Column(name = "hire_date")
    private LocalDate hireDate;

    @OneToMany(mappedBy = "employeeId", fetch = FetchType.EAGER, cascade = CascadeType.ALL, orphanRemoval = true)
    private Set<Salary> salaries;

    @OneToMany(mappedBy = "employeeId", fetch = FetchType.EAGER, cascade = CascadeType.ALL, orphanRemoval = true)
    private Set<Title> titles;

    @Override
    public Integer getId() {
        return employeeId;
    }
}

This produces the following result

Hibernate: 
    insert 
    into
        employees
        (birth_date, first_name, gender, hire_date, last_name, emp_no) 
    values
        (?, ?, ?, ?, ?, ?)
Hibernate: 
    insert 
    into
        titles
        (to_date, emp_no, from_date, title) 
    values
        (?, ?, ?, ?)
StatisticalLoggingSessionEventListener : Session Metrics {
    78588493 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    208607581 nanoseconds spent preparing 1233 JDBC statements;
    6474843328 nanoseconds spent executing 1233 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    6966643471 nanoseconds spent executing 2 flushes (flushing a total of 2466 entities and 1000 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

As can be seen, we’ve managed to reduce the number of queries issued to ingest the 500 records in half which will save both time and resources on our database server. More noteably, we cut the JDBC statement execution time from 14 seconds to 6.5, more than a 50% reduction in execution time.

These numbers are when running the application and MySQL instance on two different machines connected via WiFi. This does not represent a product environment but neither does running the application with the MySQL instance available on localhost, since then the network delay is removed completely.

There is still more work that can be done to improve on these results, more on that in a later article.

Insert without Select using JPA

Batch Ingest

Spring Data Persistable

Written by Jesper Håsteen