12. Batching
12.1. JDBC batching
JDBC offers support for batching together SQL statements that can be represented as a single PreparedStatement. Implementation wise this generally means that drivers will send the batched operation to the server in one call, which can save on network calls to the database. Hibernate can leverage JDBC batching. The following settings control this behavior.
hibernate.jdbc.batch_size
Controls the maximum number of statements Hibernate will batch together before asking the driver to execute the batch. Zero or a negative number disables this feature.
hibernate.jdbc.batch_versioned_data
Some JDBC drivers return incorrect row counts when a batch is executed. If your JDBC driver falls into this category this setting should be set to false
. Otherwise, it is safe to enable this which will allow Hibernate to still batch the DML for versioned entities and still use the returned row counts for optimistic lock checks. Since 5.0, it defaults to true. Previously (versions 3.x and 4.x), it used to be false.
hibernate.jdbc.batch.builder
Names the implementation class used to manage batching capabilities. It is almost never a good idea to switch from Hibernate’s default implementation. But if you wish to, this setting would name the org.hibernate.engine.jdbc.batch.spi.BatchBuilder
implementation to use.
hibernate.order_updates
Forces Hibernate to order SQL updates by the entity type and the primary key value of the items being updated. This allows for more batching to be used. It will also result in fewer transaction deadlocks in highly concurrent systems. Comes with a performance hit, so benchmark before and after to see if this actually helps or hurts your application.
hibernate.order_inserts
Forces Hibernate to order inserts to allow for more batching to be used. Comes with a performance hit, so benchmark before and after to see if this actually helps or hurts your application.
Since version 5.2, Hibernate allows overriding the global JDBC batch size given by the |
Example 430. Hibernate specific JDBC batch size configuration on a per Session
basis
entityManager
.unwrap( Session.class )
.setJdbcBatchSize( 10 );
12.2. Session batching
The following example shows an anti-pattern for batch inserts.
Example 431. Naive way to insert 100 000 entities with Hibernate
EntityManager entityManager = null;
EntityTransaction txn = null;
try {
entityManager = entityManagerFactory().createEntityManager();
txn = entityManager.getTransaction();
txn.begin();
for ( int i = 0; i < 100_000; i++ ) {
Person Person = new Person( String.format( "Person %d", i ) );
entityManager.persist( Person );
}
txn.commit();
} catch (RuntimeException e) {
if ( txn != null && txn.isActive()) txn.rollback();
throw e;
} finally {
if (entityManager != null) {
entityManager.close();
}
}
There are several problems associated with this example:
Hibernate caches all the newly inserted
Customer
instances in the session-level cache, so, when the transaction ends, 100 000 entities are managed by the persistence context. If the maximum memory allocated to the JVM is rather low, this example could fail with anOutOfMemoryException
. The Java 1.8 JVM allocated either 1/4 of available RAM or 1Gb, which can easily accommodate 100 000 objects on the heap.long-running transactions can deplete a connection pool so other transactions don’t get a chance to proceed.
JDBC batching is not enabled by default, so every insert statement requires a database roundtrip. To enable JDBC batching, set the
hibernate.jdbc.batch_size
property to an integer between 10 and 50.
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator. |
12.2.1. Batch inserts
When you make new objects persistent, employ methods flush()
and clear()
to the session regularly, to control the size of the first-level cache.
Example 432. Flushing and clearing the Session
EntityManager entityManager = null;
EntityTransaction txn = null;
try {
entityManager = entityManagerFactory().createEntityManager();
txn = entityManager.getTransaction();
txn.begin();
int batchSize = 25;
for ( int i = 0; i < entityCount; i++ ) {
if ( i > 0 && i % batchSize == 0 ) {
//flush a batch of inserts and release memory
entityManager.flush();
entityManager.clear();
}
Person Person = new Person( String.format( "Person %d", i ) );
entityManager.persist( Person );
}
txn.commit();
} catch (RuntimeException e) {
if ( txn != null && txn.isActive()) txn.rollback();
throw e;
} finally {
if (entityManager != null) {
entityManager.close();
}
}
12.2.2. Session scroll
When you retrieve and update data, flush()
and clear()
the session regularly. In addition, use method scroll()
to take advantage of server-side cursors for queries that return many rows of data.
Example 433. Using scroll()
EntityManager entityManager = null;
EntityTransaction txn = null;
ScrollableResults scrollableResults = null;
try {
entityManager = entityManagerFactory().createEntityManager();
txn = entityManager.getTransaction();
txn.begin();
int batchSize = 25;
Session session = entityManager.unwrap( Session.class );
scrollableResults = session
.createQuery( "select p from Person p" )
.setCacheMode( CacheMode.IGNORE )
.scroll( ScrollMode.FORWARD_ONLY );
int count = 0;
while ( scrollableResults.next() ) {
Person Person = (Person) scrollableResults.get( 0 );
processPerson(Person);
if ( ++count % batchSize == 0 ) {
//flush a batch of updates and release memory:
entityManager.flush();
entityManager.clear();
}
}
txn.commit();
} catch (RuntimeException e) {
if ( txn != null && txn.isActive()) txn.rollback();
throw e;
} finally {
if (scrollableResults != null) {
scrollableResults.close();
}
if (entityManager != null) {
entityManager.close();
}
}
If left unclosed by the application, Hibernate will automatically close the underlying resources (e.g. However, it is good practice to close the |
12.2.3. StatelessSession
StatelessSession
is a command-oriented API provided by Hibernate. Use it to stream data to and from the database in the form of detached objects. A StatelessSession
has no persistence context associated with it and does not provide many of the higher-level lifecycle semantics.
Some of the things not provided by a StatelessSession
include:
a first-level cache
interaction with any second-level or query cache
transactional write-behind or automatic dirty checking
Limitations of StatelessSession
:
Operations performed using a stateless session never cascade to associated instances.
Collections are ignored by a stateless session.
Lazy loading of associations is not supported.
Operations performed via a stateless session bypass Hibernate’s event model and interceptors.
Due to the lack of a first-level cache, Stateless sessions are vulnerable to data aliasing effects.
A stateless session is a lower-level abstraction that is much closer to the underlying JDBC.
Example 434. Using a StatelessSession
StatelessSession statelessSession = null;
Transaction txn = null;
ScrollableResults scrollableResults = null;
try {
SessionFactory sessionFactory = entityManagerFactory().unwrap( SessionFactory.class );
statelessSession = sessionFactory.openStatelessSession();
txn = statelessSession.getTransaction();
txn.begin();
scrollableResults = statelessSession
.createQuery( "select p from Person p" )
.scroll(ScrollMode.FORWARD_ONLY);
while ( scrollableResults.next() ) {
Person Person = (Person) scrollableResults.get( 0 );
processPerson(Person);
statelessSession.update( Person );
}
txn.commit();
} catch (RuntimeException e) {
if ( txn != null && txn.getStatus() == TransactionStatus.ACTIVE) txn.rollback();
throw e;
} finally {
if (scrollableResults != null) {
scrollableResults.close();
}
if (statelessSession != null) {
statelessSession.close();
}
}
The Customer
instances returned by the query are immediately detached. They are never associated with any persistence context.
The insert()
, update()
, and delete()
operations defined by the StatelessSession
interface operate directly on database rows. They cause the corresponding SQL operations to be executed immediately. They have different semantics from the save()
, saveOrUpdate()
, and delete()
operations defined by the Session
interface.
12.3. Hibernate Query Language for DML
DML, or Data Manipulation Language, refers to SQL statements such as INSERT
, UPDATE
, and DELETE
. Hibernate provides methods for bulk SQL-style DML statement execution, in the form of Hibernate Query Language (HQL).
12.3.1. HQL/JPQL for UPDATE and DELETE
Both the Hibernate native Query Language and JPQL (Java Persistence Query Language) provide support for bulk UPDATE and DELETE.
Example 435. Pseudo-syntax for UPDATE and DELETE statements using HQL
UPDATE FROM EntityName e WHERE e.name = ?
DELETE FROM EntityName e WHERE e.name = ?
Although the |
The FROM
clause can only refer to a single entity, which can be aliased. If the entity name is aliased, any property references must be qualified using that alias. If the entity name is not aliased, then it is illegal for any property references to be qualified.
Joins, either implicit or explicit, are prohibited in a bulk HQL query. You can use sub-queries in the |
Example 436. Executing a JPQL UPDATE
, using the Query.executeUpdate()
int updatedEntities = entityManager.createQuery(
"update Person p " +
"set p.name = :newName " +
"where p.name = :oldName" )
.setParameter( "oldName", oldName )
.setParameter( "newName", newName )
.executeUpdate();
Example 437. Executing an HQL UPDATE
, using the Query.executeUpdate()
int updatedEntities = session.createQuery(
"update Person " +
"set name = :newName " +
"where name = :oldName" )
.setParameter( "oldName", oldName )
.setParameter( "newName", newName )
.executeUpdate();
In keeping with the EJB3 specification, HQL UPDATE
statements, by default, do not effect the version or the timestamp property values for the affected entities. You can use a versioned update to force Hibernate to reset the version or timestamp property values, by adding the VERSIONED
keyword after the UPDATE
keyword.
Example 438. Updating the version of timestamp
int updatedEntities = session.createQuery(
"update versioned Person " +
"set name = :newName " +
"where name = :oldName" )
.setParameter( "oldName", oldName )
.setParameter( "newName", newName )
.executeUpdate();
If you use the This feature is only available in HQL since it’s not standardized by JPA. |
Example 439. A JPQL DELETE
statement
int deletedEntities = entityManager.createQuery(
"delete Person p " +
"where p.name = :name" )
.setParameter( "name", name )
.executeUpdate();
Example 440. An HQL DELETE
statement
int deletedEntities = session.createQuery(
"delete Person " +
"where name = :name" )
.setParameter( "name", name )
.executeUpdate();
Method Query.executeUpdate()
returns an int
value, which indicates the number of entities affected by the operation. This may or may not correlate to the number of rows affected in the database. A JPQL/HQL bulk operation might result in multiple SQL statements being executed, such as for joined-subclass. In the example of joined-subclass, a DELETE
against one of the subclasses may actually result in deletes in the tables underlying the join, or further down the inheritance hierarchy.
12.3.2. HQL syntax for INSERT
Example 441. Pseudo-syntax for INSERT statements
INSERT INTO EntityName
properties_list
SELECT properties_list
FROM ...
Only the INSERT INTO … SELECT …
form is supported. You cannot specify explicit values to insert.
The properties_list
is analogous to the column specification in the SQL
INSERT
statement. For entities involved in mapped inheritance, you can only use properties directly defined on that given class-level in the properties_list
. Superclass properties are not allowed and subclass properties are irrelevant. In other words, INSERT
statements are inherently non-polymorphic.
The SELECT statement can be any valid HQL select query, but the return types must match the types expected by the INSERT. Hibernate verifies the return types during query compilation, instead of expecting the database to check it. Problems might result from Hibernate types which are equivalent, rather than equal. One such example is a mismatch between a property defined as an org.hibernate.type.DateType
and a property defined as an org.hibernate.type.TimestampType
, even though the database may not make a distinction, or may be capable of handling the conversion.
If id property is not specified in the properties_list
, Hibernate generates a value automatically. Automatic generation is only available if you use ID generators which operate on the database. Otherwise, Hibernate throws an exception during parsing. Available in-database generators are org.hibernate.id.SequenceGenerator
and its subclasses, and objects which implement org.hibernate.id.PostInsertIdentifierGenerator
.
For properties mapped as either version or timestamp, the insert statement gives you two options. You can either specify the property in the properties_list, in which case its value is taken from the corresponding select expressions or omit it from the properties_list, in which case the seed value defined by the org.hibernate.type.VersionType is used.
Example 442. HQL INSERT statement
int insertedEntities = session.createQuery(
"insert into Partner (id, name) " +
"select p.id, p.name " +
"from Person p ")
.executeUpdate();
This section is only a brief overview of HQL. For more information, see HQL.
12.3.3. Bulk-id strategies
This article is about the HHH-11262 JIRA issue which now allows the bulk-id strategies to work even when you cannot create temporary tables.
Class diagram
Considering we have the following entities:
The Person
entity is the base class of this entity inheritance model, and is mapped as follows:
Example 443. Bulk-id base class entity
@Entity(name = "Person")
@Inheritance(strategy = InheritanceType.JOINED)
public static class Person implements Serializable {
@Id
private Integer id;
@Id
private String companyName;
private String name;
private boolean employed;
//Getters and setters are omitted for brevity
}
Both the Doctor
and Engineer
entity classes extend the Person
base class:
Example 444. Bulk-id subclass entities
@Entity(name = "Doctor")
public static class Doctor extends Person {
}
@Entity(name = "Engineer")
public static class Engineer extends Person {
private boolean fellow;
//Getters and setters are omitted for brevity
}
Inheritance tree bulk processing
Now, when you try to execute a bulk entity delete query:
Example 445. Bulk-id delete query example
int updateCount = session.createQuery(
"delete from Person where employed = :employed" )
.setParameter( "employed", false )
.executeUpdate();
create temporary table
HT_Person
(
id int4 not null,
companyName varchar(255) not null
)
insert
into
HT_Person
select
p.id as id,
p.companyName as companyName
from
Person p
where
p.employed = ?
delete
from
Engineer
where
(
id, companyName
) IN (
select
id,
companyName
from
HT_Person
)
delete
from
Doctor
where
(
id, companyName
) IN (
select
id,
companyName
from
HT_Person
)
delete
from
Person
where
(
id, companyName
) IN (
select
id,
companyName
from
HT_Person
)
HT_Person
is a temporary table that Hibernate creates to hold all the entity identifiers that are to be updated or deleted by the bulk id operation. The temporary table can be either global or local, depending on the underlying database capabilities.
Non-temporary table bulk-id strategies
As the HHH-11262 issue describes, there are use cases when the application developer cannot use temporary tables because the database user lacks this privilege.
In this case, we defined several options which you can choose depending on your database capabilities:
InlineIdsInClauseBulkIdStrategy
InlineIdsSubSelectValueListBulkIdStrategy
InlineIdsOrClauseBulkIdStrategy
CteValuesListBulkIdStrategy
InlineIdsInClauseBulkIdStrategy
To use this strategy, you need to configure the following configuration property:
<property name="hibernate.hql.bulk_id_strategy"
value="org.hibernate.hql.spi.id.inline.InlineIdsInClauseBulkIdStrategy"
/>
Now, when running the previous test case, Hibernate generates the following SQL statements:
Example 446. InlineIdsInClauseBulkIdStrategy
delete entity query example
select
p.id as id,
p.companyName as companyName
from
Person p
where
p.employed = ?
delete
from
Engineer
where
( id, companyName )
in (
( 1,'Red Hat USA' ),
( 3,'Red Hat USA' ),
( 1,'Red Hat Europe' ),
( 3,'Red Hat Europe' )
)
delete
from
Doctor
where
( id, companyName )
in (
( 1,'Red Hat USA' ),
( 3,'Red Hat USA' ),
( 1,'Red Hat Europe' ),
( 3,'Red Hat Europe' )
)
delete
from
Person
where
( id, companyName )
in (
( 1,'Red Hat USA' ),
( 3,'Red Hat USA' ),
( 1,'Red Hat Europe' ),
( 3,'Red Hat Europe' )
)
So, the entity identifiers are selected first and used for each particular update or delete statement.
The IN clause row value expression has long been supported by Oracle, PostgreSQL, and nowadays by MySQL 5.7. However, SQL Server 2014 does not support it, so you’ll have to use a different strategy. |
InlineIdsSubSelectValueListBulkIdStrategy
To use this strategy, you need to configure the following configuration property:
<property name="hibernate.hql.bulk_id_strategy"
value="org.hibernate.hql.spi.id.inline.InlineIdsSubSelectValueListBulkIdStrategy"
/>
Now, when running the previous test case, Hibernate generates the following SQL statements:
Example 447. InlineIdsSubSelectValueListBulkIdStrategy
delete entity query example
select
p.id as id,
p.companyName as companyName
from
Person p
where
p.employed = ?
delete
from
Engineer
where
( id, companyName ) in (
select
id,
companyName
from (
values
( 1,'Red Hat USA' ),
( 3,'Red Hat USA' ),
( 1,'Red Hat Europe' ),
( 3,'Red Hat Europe' )
) as HT
(id, companyName)
)
delete
from
Doctor
where
( id, companyName ) in (
select
id,
companyName
from (
values
( 1,'Red Hat USA' ),
( 3,'Red Hat USA' ),
( 1,'Red Hat Europe' ),
( 3,'Red Hat Europe' )
) as HT
(id, companyName)
)
delete
from
Person
where
( id, companyName ) in (
select
id,
companyName
from (
values
( 1,'Red Hat USA' ),
( 3,'Red Hat USA' ),
( 1,'Red Hat Europe' ),
( 3,'Red Hat Europe' )
) as HT
(id, companyName)
)
The underlying database must support the |
InlineIdsOrClauseBulkIdStrategy
To use this strategy, you need to configure the following configuration property:
<property name="hibernate.hql.bulk_id_strategy"
value="org.hibernate.hql.spi.id.inline.InlineIdsOrClauseBulkIdStrategy"
/>
Now, when running the previous test case, Hibernate generates the following SQL statements:
Example 448. InlineIdsOrClauseBulkIdStrategy
delete entity query example
select
p.id as id,
p.companyName as companyName
from
Person p
where
p.employed = ?
delete
from
Engineer
where
( id = 1 and companyName = 'Red Hat USA' )
or ( id = 3 and companyName = 'Red Hat USA' )
or ( id = 1 and companyName = 'Red Hat Europe' )
or ( id = 3 and companyName = 'Red Hat Europe' )
delete
from
Doctor
where
( id = 1 and companyName = 'Red Hat USA' )
or ( id = 3 and companyName = 'Red Hat USA' )
or ( id = 1 and companyName = 'Red Hat Europe' )
or ( id = 3 and companyName = 'Red Hat Europe' )
delete
from
Person
where
( id = 1 and companyName = 'Red Hat USA' )
or ( id = 3 and companyName = 'Red Hat USA' )
or ( id = 1 and companyName = 'Red Hat Europe' )
or ( id = 3 and companyName = 'Red Hat Europe' )
The |
CteValuesListBulkIdStrategy
To use this strategy, you need to configure the following configuration property:
<property name="hibernate.hql.bulk_id_strategy"
value="org.hibernate.hql.spi.id.inline.CteValuesListBulkIdStrategy"
/>
Now, when running the previous test case, Hibernate generates the following SQL statements:
Example 449. CteValuesListBulkIdStrategy
delete entity query example
select
p.id as id,
p.companyName as companyName
from
Person p
where
p.employed = ?
with HT_Person (id,companyName ) as (
select id, companyName
from (
values
(?, ?),
(?, ?),
(?, ?),
(?, ?)
) as HT (id, companyName) )
delete
from
Engineer
where
( id, companyName ) in (
select
id, companyName
from
HT_Person
)
with HT_Person (id,companyName ) as (
select id, companyName
from (
values
(?, ?),
(?, ?),
(?, ?),
(?, ?)
) as HT (id, companyName) )
delete
from
Doctor
where
( id, companyName ) in (
select
id, companyName
from
HT_Person
)
with HT_Person (id,companyName ) as (
select id, companyName
from (
values
(?, ?),
(?, ?),
(?, ?),
(?, ?)
) as HT (id, companyName) )
delete
from
Person
where
( id, companyName ) in (
select
id, companyName
from
HT_Person
)
The underlying database must support CTE (Common Table Expressions) that can be referenced from non-query statements as well. For instance, PostgreSQL supports this feature since version 9.1 and SQL Server offers support for it since version 2005. The underlying database must also support the VALUES list clause, like PostgreSQL or SQL Server 2008. However, this strategy requires the IN-clause row value expression for composite identifiers, so you can only use this strategy with PostgreSQL. |
If you can use temporary tables, that’s probably the best choice. However, if you are not allowed to create temporary tables, you must pick one of these four strategies that works with your underlying database. Before making up your mind, you should benchmark which one works best for your current workload. For instance, CTE are optimization fences in PostgreSQL, so make sure you measure before making a decision.
If you’re using Oracle or MySQL 5.7, you can choose either InlineIdsOrClauseBulkIdStrategy
or InlineIdsInClauseBulkIdStrategy
. For older version of MySQL, then you can only use InlineIdsOrClauseBulkIdStrategy
.
If you’re using SQL Server, InlineIdsOrClauseBulkIdStrategy
is the only option for you.
If you’re using PostgreSQL, then you have the luxury of choosing any of these four strategies.