User Guide - 12. Batching - 《Hibernate ORM 5.4 Document》

12. Batching

12. Batching

12.1. JDBC batching

JDBC offers support for batching together SQL statements that can be represented as a single PreparedStatement. Implementation wise this generally means that drivers will send the batched operation to the server in one call, which can save on network calls to the database. Hibernate can leverage JDBC batching. The following settings control this behavior.

hibernate.jdbc.batch_size

Controls the maximum number of statements Hibernate will batch together before asking the driver to execute the batch. Zero or a negative number disables this feature.

hibernate.jdbc.batch_versioned_data

Some JDBC drivers return incorrect row counts when a batch is executed. If your JDBC driver falls into this category this setting should be set to false. Otherwise, it is safe to enable this which will allow Hibernate to still batch the DML for versioned entities and still use the returned row counts for optimistic lock checks. Since 5.0, it defaults to true. Previously (versions 3.x and 4.x), it used to be false.

hibernate.jdbc.batch.builder

Names the implementation class used to manage batching capabilities. It is almost never a good idea to switch from Hibernate’s default implementation. But if you wish to, this setting would name the org.hibernate.engine.jdbc.batch.spi.BatchBuilder implementation to use.

hibernate.order_updates

Forces Hibernate to order SQL updates by the entity type and the primary key value of the items being updated. This allows for more batching to be used. It will also result in fewer transaction deadlocks in highly concurrent systems. Comes with a performance hit, so benchmark before and after to see if this actually helps or hurts your application.

hibernate.order_inserts

Forces Hibernate to order inserts to allow for more batching to be used. Comes with a performance hit, so benchmark before and after to see if this actually helps or hurts your application.

Since version 5.2, Hibernate allows overriding the global JDBC batch size given by the hibernate.jdbc.batch_size configuration property on a per Session basis.

Example 430. Hibernate specific JDBC batch size configuration on a per Session basis

entityManager
    .unwrap( Session.class )
    .setJdbcBatchSize( 10 );

12.2. Session batching

The following example shows an anti-pattern for batch inserts.

Example 431. Naive way to insert 100 000 entities with Hibernate

EntityManager entityManager = null;
EntityTransaction txn = null;
try {
    entityManager = entityManagerFactory().createEntityManager();
    txn = entityManager.getTransaction();
    txn.begin();
    for ( int i = 0; i < 100_000; i++ ) {
        Person Person = new Person( String.format( "Person %d", i ) );
        entityManager.persist( Person );
    }
    txn.commit();
} catch (RuntimeException e) {
    if ( txn != null && txn.isActive()) txn.rollback();
        throw e;
} finally {
    if (entityManager != null) {
        entityManager.close();
    }
}

There are several problems associated with this example:

Hibernate caches all the newly inserted Customer instances in the session-level cache, so, when the transaction ends, 100 000 entities are managed by the persistence context. If the maximum memory allocated to the JVM is rather low, this example could fail with an OutOfMemoryException. The Java 1.8 JVM allocated either 1/4 of available RAM or 1Gb, which can easily accommodate 100 000 objects on the heap.
long-running transactions can deplete a connection pool so other transactions don’t get a chance to proceed.
JDBC batching is not enabled by default, so every insert statement requires a database roundtrip. To enable JDBC batching, set the hibernate.jdbc.batch_size property to an integer between 10 and 50.

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

12.2.1. Batch inserts

When you make new objects persistent, employ methods flush() and clear() to the session regularly, to control the size of the first-level cache.

Example 432. Flushing and clearing the Session

EntityManager entityManager = null;
EntityTransaction txn = null;
try {
    entityManager = entityManagerFactory().createEntityManager();
    txn = entityManager.getTransaction();
    txn.begin();
    int batchSize = 25;
    for ( int i = 0; i < entityCount; i++ ) {
        if ( i > 0 && i % batchSize == 0 ) {
            //flush a batch of inserts and release memory
            entityManager.flush();
            entityManager.clear();
        }
        Person Person = new Person( String.format( "Person %d", i ) );
        entityManager.persist( Person );
    }
    txn.commit();
} catch (RuntimeException e) {
    if ( txn != null && txn.isActive()) txn.rollback();
        throw e;
} finally {
    if (entityManager != null) {
        entityManager.close();
    }
}

12.2.2. Session scroll

When you retrieve and update data, flush() and clear() the session regularly. In addition, use method scroll() to take advantage of server-side cursors for queries that return many rows of data.

Example 433. Using scroll()

EntityManager entityManager = null;
EntityTransaction txn = null;
ScrollableResults scrollableResults = null;
try {
    entityManager = entityManagerFactory().createEntityManager();
    txn = entityManager.getTransaction();
    txn.begin();
    int batchSize = 25;
    Session session = entityManager.unwrap( Session.class );
    scrollableResults = session
        .createQuery( "select p from Person p" )
        .setCacheMode( CacheMode.IGNORE )
        .scroll( ScrollMode.FORWARD_ONLY );
    int count = 0;
    while ( scrollableResults.next() ) {
        Person Person = (Person) scrollableResults.get( 0 );
        processPerson(Person);
        if ( ++count % batchSize == 0 ) {
            //flush a batch of updates and release memory:
            entityManager.flush();
            entityManager.clear();
        }
    }
    txn.commit();
} catch (RuntimeException e) {
    if ( txn != null && txn.isActive()) txn.rollback();
        throw e;
} finally {
    if (scrollableResults != null) {
        scrollableResults.close();
    }
    if (entityManager != null) {
        entityManager.close();
    }
}

If left unclosed by the application, Hibernate will automatically close the underlying resources (e.g. ResultSet and PreparedStatement) used internally by the ScrollableResults when the current transaction is ended (either commit or rollback).

However, it is good practice to close the ScrollableResults explicitly.

12.2.3. StatelessSession

StatelessSession is a command-oriented API provided by Hibernate. Use it to stream data to and from the database in the form of detached objects. A StatelessSession has no persistence context associated with it and does not provide many of the higher-level lifecycle semantics.

Some of the things not provided by a StatelessSession include:

a first-level cache
interaction with any second-level or query cache
transactional write-behind or automatic dirty checking

Limitations of StatelessSession:

Operations performed using a stateless session never cascade to associated instances.
Collections are ignored by a stateless session.
Lazy loading of associations is not supported.
Operations performed via a stateless session bypass Hibernate’s event model and interceptors.
Due to the lack of a first-level cache, Stateless sessions are vulnerable to data aliasing effects.
A stateless session is a lower-level abstraction that is much closer to the underlying JDBC.

Example 434. Using a StatelessSession

StatelessSession statelessSession = null;
Transaction txn = null;
ScrollableResults scrollableResults = null;
try {
    SessionFactory sessionFactory = entityManagerFactory().unwrap( SessionFactory.class );
    statelessSession = sessionFactory.openStatelessSession();
    txn = statelessSession.getTransaction();
    txn.begin();
    scrollableResults = statelessSession
        .createQuery( "select p from Person p" )
        .scroll(ScrollMode.FORWARD_ONLY);
    while ( scrollableResults.next() ) {
        Person Person = (Person) scrollableResults.get( 0 );
        processPerson(Person);
        statelessSession.update( Person );
    }
    txn.commit();
} catch (RuntimeException e) {
    if ( txn != null && txn.getStatus() == TransactionStatus.ACTIVE) txn.rollback();
        throw e;
} finally {
    if (scrollableResults != null) {
        scrollableResults.close();
    }
    if (statelessSession != null) {
        statelessSession.close();
    }
}

The Customer instances returned by the query are immediately detached. They are never associated with any persistence context.

The insert(), update(), and delete() operations defined by the StatelessSession interface operate directly on database rows. They cause the corresponding SQL operations to be executed immediately. They have different semantics from the save(), saveOrUpdate(), and delete() operations defined by the Session interface.

12.3. Hibernate Query Language for DML

DML, or Data Manipulation Language, refers to SQL statements such as INSERT, UPDATE, and DELETE. Hibernate provides methods for bulk SQL-style DML statement execution, in the form of Hibernate Query Language (HQL).

12.3.1. HQL/JPQL for UPDATE and DELETE

Both the Hibernate native Query Language and JPQL (Java Persistence Query Language) provide support for bulk UPDATE and DELETE.

Example 435. Pseudo-syntax for UPDATE and DELETE statements using HQL

UPDATE FROM EntityName e WHERE e.name = ?
DELETE FROM EntityName e WHERE e.name = ?

Although the FROM and WHERE clauses are optional, it is good practice to declare them explicitly.

The FROM clause can only refer to a single entity, which can be aliased. If the entity name is aliased, any property references must be qualified using that alias. If the entity name is not aliased, then it is illegal for any property references to be qualified.

Joins, either implicit or explicit, are prohibited in a bulk HQL query. You can use sub-queries in the WHERE clause, and the sub-queries themselves can contain joins.

Example 436. Executing a JPQL UPDATE, using the Query.executeUpdate()

int updatedEntities = entityManager.createQuery(
    "update Person p " +
    "set p.name = :newName " +
    "where p.name = :oldName" )
.setParameter( "oldName", oldName )
.setParameter( "newName", newName )
.executeUpdate();

Example 437. Executing an HQL UPDATE, using the Query.executeUpdate()

int updatedEntities = session.createQuery(
    "update Person " +
    "set name = :newName " +
    "where name = :oldName" )
.setParameter( "oldName", oldName )
.setParameter( "newName", newName )
.executeUpdate();

In keeping with the EJB3 specification, HQL UPDATE statements, by default, do not effect the version or the timestamp property values for the affected entities. You can use a versioned update to force Hibernate to reset the version or timestamp property values, by adding the VERSIONED keyword after the UPDATE keyword.

Example 438. Updating the version of timestamp

int updatedEntities = session.createQuery(
    "update versioned Person " +
    "set name = :newName " +
    "where name = :oldName" )
.setParameter( "oldName", oldName )
.setParameter( "newName", newName )
.executeUpdate();

If you use the VERSIONED statement, you cannot use custom version types that implement the org.hibernate.usertype.UserVersionType.

This feature is only available in HQL since it’s not standardized by JPA.

Example 439. A JPQL DELETE statement

int deletedEntities = entityManager.createQuery(
    "delete Person p " +
    "where p.name = :name" )
.setParameter( "name", name )
.executeUpdate();

Example 440. An HQL DELETE statement

int deletedEntities = session.createQuery(
    "delete Person " +
    "where name = :name" )
.setParameter( "name", name )
.executeUpdate();

Method Query.executeUpdate() returns an int value, which indicates the number of entities affected by the operation. This may or may not correlate to the number of rows affected in the database. A JPQL/HQL bulk operation might result in multiple SQL statements being executed, such as for joined-subclass. In the example of joined-subclass, a DELETE against one of the subclasses may actually result in deletes in the tables underlying the join, or further down the inheritance hierarchy.

12.3.2. HQL syntax for INSERT

Example 441. Pseudo-syntax for INSERT statements

INSERT INTO EntityName
    properties_list
SELECT properties_list
FROM ...

Only the INSERT INTO … SELECT … form is supported. You cannot specify explicit values to insert.

The properties_list is analogous to the column specification in the SQL INSERT statement. For entities involved in mapped inheritance, you can only use properties directly defined on that given class-level in the properties_list. Superclass properties are not allowed and subclass properties are irrelevant. In other words, INSERT statements are inherently non-polymorphic.

The SELECT statement can be any valid HQL select query, but the return types must match the types expected by the INSERT. Hibernate verifies the return types during query compilation, instead of expecting the database to check it. Problems might result from Hibernate types which are equivalent, rather than equal. One such example is a mismatch between a property defined as an org.hibernate.type.DateType and a property defined as an org.hibernate.type.TimestampType, even though the database may not make a distinction, or may be capable of handling the conversion.

If id property is not specified in the properties_list, Hibernate generates a value automatically. Automatic generation is only available if you use ID generators which operate on the database. Otherwise, Hibernate throws an exception during parsing. Available in-database generators are org.hibernate.id.SequenceGenerator and its subclasses, and objects which implement org.hibernate.id.PostInsertIdentifierGenerator.

For properties mapped as either version or timestamp, the insert statement gives you two options. You can either specify the property in the properties_list, in which case its value is taken from the corresponding select expressions or omit it from the properties_list, in which case the seed value defined by the org.hibernate.type.VersionType is used.

Example 442. HQL INSERT statement

int insertedEntities = session.createQuery(
    "insert into Partner (id, name) " +
    "select p.id, p.name " +
    "from Person p ")
.executeUpdate();

This section is only a brief overview of HQL. For more information, see HQL.

12.3.3. Bulk-id strategies

This article is about the HHH-11262 JIRA issue which now allows the bulk-id strategies to work even when you cannot create temporary tables.

Class diagram

Considering we have the following entities:

The Person entity is the base class of this entity inheritance model, and is mapped as follows:

Example 443. Bulk-id base class entity

@Entity(name = "Person")
@Inheritance(strategy = InheritanceType.JOINED)
public static class Person implements Serializable {
    @Id
    private Integer id;
    @Id
    private String companyName;
    private String name;
    private boolean employed;
    //Getters and setters are omitted for brevity
}

Both the Doctor and Engineer entity classes extend the Person base class:

Example 444. Bulk-id subclass entities

@Entity(name = "Doctor")
public static class Doctor extends Person {
}
@Entity(name = "Engineer")
public static class Engineer extends Person {
    private boolean fellow;
    //Getters and setters are omitted for brevity
}

Inheritance tree bulk processing

Now, when you try to execute a bulk entity delete query:

Example 445. Bulk-id delete query example

int updateCount = session.createQuery(
    "delete from Person where employed = :employed" )
.setParameter( "employed", false )
.executeUpdate();

create temporary table
    HT_Person
(
    id int4 not null,
    companyName varchar(255) not null
)
insert
into
    HT_Person
    select
        p.id as id,
        p.companyName as companyName
    from
        Person p
    where
        p.employed = ?
delete
from
    Engineer
where
    (
        id, companyName
    ) IN (
        select
            id,
            companyName
        from
            HT_Person
    )
delete
from
    Doctor
where
    (
        id, companyName
    ) IN (
        select
            id,
            companyName
        from
            HT_Person
    )
delete
from
    Person
where
    (
        id, companyName
    ) IN (
        select
            id,
            companyName
        from
            HT_Person
    )

HT_Person is a temporary table that Hibernate creates to hold all the entity identifiers that are to be updated or deleted by the bulk id operation. The temporary table can be either global or local, depending on the underlying database capabilities.

Non-temporary table bulk-id strategies

As the HHH-11262 issue describes, there are use cases when the application developer cannot use temporary tables because the database user lacks this privilege.

In this case, we defined several options which you can choose depending on your database capabilities:

InlineIdsInClauseBulkIdStrategy
InlineIdsSubSelectValueListBulkIdStrategy
InlineIdsOrClauseBulkIdStrategy
CteValuesListBulkIdStrategy

`InlineIdsInClauseBulkIdStrategy`

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.InlineIdsInClauseBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

Example 446. InlineIdsInClauseBulkIdStrategy delete entity query example

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?
delete
from
    Engineer
where
        ( id, companyName )
    in (
        ( 1,'Red Hat USA' ),
        ( 3,'Red Hat USA' ),
        ( 1,'Red Hat Europe' ),
        ( 3,'Red Hat Europe' )
    )
delete
from
    Doctor
where
        ( id, companyName )
    in (
        ( 1,'Red Hat USA' ),
        ( 3,'Red Hat USA' ),
        ( 1,'Red Hat Europe' ),
        ( 3,'Red Hat Europe' )
    )
delete
from
    Person
where
        ( id, companyName )
    in (
        ( 1,'Red Hat USA' ),
        ( 3,'Red Hat USA' ),
        ( 1,'Red Hat Europe' ),
        ( 3,'Red Hat Europe' )
    )

So, the entity identifiers are selected first and used for each particular update or delete statement.

The IN clause row value expression has long been supported by Oracle, PostgreSQL, and nowadays by MySQL 5.7. However, SQL Server 2014 does not support it, so you’ll have to use a different strategy.

`InlineIdsSubSelectValueListBulkIdStrategy`

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.InlineIdsSubSelectValueListBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

Example 447. InlineIdsSubSelectValueListBulkIdStrategy delete entity query example

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?
delete
from
    Engineer
where
    ( id, companyName ) in (
        select
            id,
            companyName
        from (
        values
            ( 1,'Red Hat USA' ),
            ( 3,'Red Hat USA' ),
            ( 1,'Red Hat Europe' ),
            ( 3,'Red Hat Europe' )
        ) as HT
            (id, companyName)
    )
delete
from
    Doctor
where
    ( id, companyName ) in (
         select
            id,
            companyName
        from (
        values
            ( 1,'Red Hat USA' ),
            ( 3,'Red Hat USA' ),
            ( 1,'Red Hat Europe' ),
            ( 3,'Red Hat Europe' )
        ) as HT
            (id, companyName)
    )
delete
from
    Person
where
    ( id, companyName ) in (
        select
            id,
            companyName
        from (
        values
            ( 1,'Red Hat USA' ),
            ( 3,'Red Hat USA' ),
            ( 1,'Red Hat Europe' ),
            ( 3,'Red Hat Europe' )
        ) as HT
            (id, companyName)
    )

The underlying database must support the VALUES list clause, like PostgreSQL or SQL Server 2008. However, this strategy requires the IN-clause row value expression for composite identifiers, and for this reason, you can only use the InlineIdsSubSelectValueListBulkIdStrategy strategy with PostgreSQL.

`InlineIdsOrClauseBulkIdStrategy`

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.InlineIdsOrClauseBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

Example 448. InlineIdsOrClauseBulkIdStrategy delete entity query example

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?
delete
from
    Engineer
where
    ( id = 1 and companyName = 'Red Hat USA' )
or  ( id = 3 and companyName = 'Red Hat USA' )
or  ( id = 1 and companyName = 'Red Hat Europe' )
or  ( id = 3 and companyName = 'Red Hat Europe' )
delete
from
    Doctor
where
    ( id = 1 and companyName = 'Red Hat USA' )
or  ( id = 3 and companyName = 'Red Hat USA' )
or  ( id = 1 and companyName = 'Red Hat Europe' )
or  ( id = 3 and companyName = 'Red Hat Europe' )
delete
from
    Person
where
    ( id = 1 and companyName = 'Red Hat USA' )
or  ( id = 3 and companyName = 'Red Hat USA' )
or  ( id = 1 and companyName = 'Red Hat Europe' )
or  ( id = 3 and companyName = 'Red Hat Europe' )

The InlineIdsOrClauseBulkIdStrategy strategy has the advantage of being supported by all the major relational database systems (e.g. Oracle, SQL Server, MySQL, and PostgreSQL).

`CteValuesListBulkIdStrategy`

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.CteValuesListBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

Example 449. CteValuesListBulkIdStrategy delete entity query example

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?
with HT_Person (id,companyName ) as (
    select id, companyName
    from (
    values
        (?, ?),
        (?, ?),
        (?, ?),
        (?, ?)
    ) as HT (id, companyName) )
delete
from
    Engineer
where
    ( id, companyName ) in (
        select
            id, companyName
        from
            HT_Person
    )
with HT_Person (id,companyName ) as (
    select id, companyName
    from (
    values
        (?, ?),
        (?, ?),
        (?, ?),
        (?, ?)
    ) as HT (id, companyName) )
delete
from
    Doctor
where
    ( id, companyName ) in (
        select
            id, companyName
        from
            HT_Person
    )
with HT_Person (id,companyName ) as (
    select id, companyName
    from (
    values
        (?, ?),
        (?, ?),
        (?, ?),
        (?, ?)
    ) as HT (id, companyName) )
delete
from
    Person
where
    ( id, companyName ) in (
        select
            id, companyName
        from
            HT_Person
    )

The underlying database must support CTE (Common Table Expressions) that can be referenced from non-query statements as well. For instance, PostgreSQL supports this feature since version 9.1 and SQL Server offers support for it since version 2005.

The underlying database must also support the VALUES list clause, like PostgreSQL or SQL Server 2008.

However, this strategy requires the IN-clause row value expression for composite identifiers, so you can only use this strategy with PostgreSQL.

If you can use temporary tables, that’s probably the best choice. However, if you are not allowed to create temporary tables, you must pick one of these four strategies that works with your underlying database. Before making up your mind, you should benchmark which one works best for your current workload. For instance, CTE are optimization fences in PostgreSQL, so make sure you measure before making a decision.

If you’re using Oracle or MySQL 5.7, you can choose either InlineIdsOrClauseBulkIdStrategy or InlineIdsInClauseBulkIdStrategy. For older version of MySQL, then you can only use InlineIdsOrClauseBulkIdStrategy.

If you’re using SQL Server, InlineIdsOrClauseBulkIdStrategy is the only option for you.

If you’re using PostgreSQL, then you have the luxury of choosing any of these four strategies.