Quarkus - Hibernate Search guide

Quarkus - Hibernate Search guide

You have a Hibernate ORM-based application? You want to provide a full-featured full-text search to your users? You’re at the right place.

With this guide, you’ll learn how to synchronize your entities to an Elasticsearch cluster in a heart beat with Hibernate Search. We will also explore how you can can query your Elasticsearch cluster using the Hibernate Search API.

This technology is considered preview.

In preview, backward compatibility and presence in the ecosystem is not guaranteed. Specific improvements might require to change configuration or APIs and plans to become stable are under way. Feedback is welcome on our mailing list or as issues in our GitHub issue tracker.

For a full list of possible extension statuses, check our FAQ entry.

This extension is based on a beta version of Hibernate Search. While APIs are quite stable and the code is of production quality and thoroughly tested, some features are still missing, performance might not be optimal and some APIs or configuration properties might change as the extension matures.

Prerequisites

To complete this guide, you need:

less than 20 minutes
an IDE
JDK 1.8+ installed with JAVA_HOME configured appropriately
Apache Maven 3.6.2+
Docker
GraalVM installed if you want to run in native mode

Architecture

The application described in this guide allows to manage a (simple) library: you manage authors and their books.

The entities are stored in a PostgreSQL database and indexed in an Elasticsearch cluster.

Solution

We recommend that you follow the instructions in the next sections and create the application step by step. However, you can go right to the completed example.

Clone the Git repository: git clone [https://github.com/quarkusio/quarkus-quickstarts.git](https://github.com/quarkusio/quarkus-quickstarts.git), or download an archive.

The solution is located in the hibernate-search-elasticsearch-quickstart directory.

The provided solution contains a few additional elements such as tests and testing infrastructure.

Creating the Maven project

First, we need a new project. Create a new project with the following command:

mvn io.quarkus:quarkus-maven-plugin:1.7.6.Final:create \
    -DprojectGroupId=org.acme \
    -DprojectArtifactId=hibernate-search-elasticsearch-quickstart \
    -DclassName="org.acme.hibernate.search.elasticsearch.LibraryResource" \
    -Dpath="/library" \
    -Dextensions="hibernate-orm-panache, hibernate-search-elasticsearch, resteasy-jsonb, jdbc-postgresql"
cd hibernate-search-elasticsearch-quickstart

This command generates a Maven structure importing the following extensions:

Hibernate ORM with Panache,
the PostgreSQL JDBC driver,
Hibernate Search + Elasticsearch,
RESTEasy and JSON-B.

If you already have your Quarkus project configured, you can add the hibernate-search-elasticsearch extension to your project by running the following command in your project base directory:

./mvnw quarkus:add-extension -Dextensions="hibernate-search-elasticsearch"

This will add the following to your pom.xml:

<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-hibernate-search-elasticsearch</artifactId>
</dependency>

For now, let’s delete the two generated tests LibraryResourceTest and NativeLibraryResourceIT present in src/test/java. If you are interested in how you can test this application, just refer to the solution in the quickstarts Git repository: it contains a lot of tests and the required testing infrastructure.

Creating the bare entities

First, let’s create our Hibernate ORM entities Book and Author in the model subpackage.

package org.acme.hibernate.search.elasticsearch.model;
import java.util.List;
import java.util.Objects;
import javax.persistence.CascadeType;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.OneToMany;
import io.quarkus.hibernate.orm.panache.PanacheEntity;
@Entity
public class Author extends PanacheEntity { (1)
    public String firstName;
    public String lastName;
    @OneToMany(mappedBy = "author", cascade = CascadeType.ALL, orphanRemoval = true, fetch = FetchType.EAGER) (2)
    public List<Book> books;
    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (!(o instanceof Author)) {
            return false;
        }
        Author other = (Author) o;
        return Objects.equals(id, other.id);
    }
    @Override
    public int hashCode() {
        return 31;
    }
}

1	We are using Hibernate ORM with Panache, it is not mandatory.
2	We are loading these elements eagerly so that they are present in the JSON output. In a real world application, you should probably use a DTO approach.

package org.acme.hibernate.search.elasticsearch.model;
import java.util.Objects;
import javax.json.bind.annotation.JsonbTransient;
import javax.persistence.Entity;
import javax.persistence.ManyToOne;
import io.quarkus.hibernate.orm.panache.PanacheEntity;
@Entity
public class Book extends PanacheEntity {
    public String title;
    @ManyToOne
    @JsonbTransient (1)
    public Author author;
    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (!(o instanceof Book)) {
            return false;
        }
        Book other = (Book) o;
        return Objects.equals(id, other.id);
    }
    @Override
    public int hashCode() {
        return 31;
    }
}

1	We mark this property with `@JsonbTransient` to avoid infinite loops when serializing with JSON-B.

Initializing the REST service

While everything is not yet set up for our REST service, we can initialize it with the standard CRUD operations we will need.

Just copy this content in the LibraryResource file created by the Maven create-project command:

package org.acme.hibernate.search.elasticsearch;
import javax.inject.Inject;
import javax.persistence.EntityManager;
import javax.transaction.Transactional;
import javax.ws.rs.Consumes;
import javax.ws.rs.DELETE;
import javax.ws.rs.POST;
import javax.ws.rs.PUT;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;
import org.acme.hibernate.search.elasticsearch.model.Author;
import org.acme.hibernate.search.elasticsearch.model.Book;
import org.jboss.resteasy.annotations.jaxrs.FormParam;
import org.jboss.resteasy.annotations.jaxrs.PathParam;
@Path("/library")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class LibraryResource {
    @Inject
    EntityManager em;
    @PUT
    @Path("book")
    @Transactional
    @Consumes(MediaType.APPLICATION_FORM_URLENCODED)
    public void addBook(@FormParam String title, @FormParam Long authorId) {
        Author author = Author.findById(authorId);
        if (author == null) {
            return;
        }
        Book book = new Book();
        book.title = title;
        book.author = author;
        book.persist();
        author.books.add(book);
        author.persist();
    }
    @DELETE
    @Path("book/{id}")
    @Transactional
    public void deleteBook(@PathParam Long id) {
        Book book = Book.findById(id);
        if (book != null) {
            book.author.books.remove(book);
            book.delete();
        }
    }
    @PUT
    @Path("author")
    @Transactional
    @Consumes(MediaType.APPLICATION_FORM_URLENCODED)
    public void addAuthor(@FormParam String firstName, @FormParam String lastName) {
        Author author = new Author();
        author.firstName = firstName;
        author.lastName = lastName;
        author.persist();
    }
    @POST
    @Path("author/{id}")
    @Transactional
    @Consumes(MediaType.APPLICATION_FORM_URLENCODED)
    public void updateAuthor(@PathParam Long id, @FormParam String firstName, @FormParam String lastName) {
        Author author = Author.findById(id);
        if (author == null) {
            return;
        }
        author.firstName = firstName;
        author.lastName = lastName;
        author.persist();
    }
    @DELETE
    @Path("author/{id}")
    @Transactional
    public void deleteAuthor(@PathParam Long id) {
        Author author = Author.findById(id);
        if (author != null) {
            author.delete();
        }
    }
}

Nothing out of the ordinary here: it is just good old Hibernate ORM with Panache operations in a standard JAX-RS service.

In fact, the interesting part is that we will need to add very few elements to make our full text search application working.

Using Hibernate Search annotations

Let’s go back to our entities.

Enabling full text search capabilities for them is as simple as adding a few annotations.

Let’s edit the Author entity again to include this content:

package org.acme.hibernate.search.elasticsearch.model;
import java.util.Objects;
import javax.json.bind.annotation.JsonbTransient;
import javax.persistence.Entity;
import javax.persistence.ManyToOne;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import io.quarkus.hibernate.orm.panache.PanacheEntity;
@Entity
@Indexed (1)
public class Book extends PanacheEntity {
    @FullTextField(analyzer = "english") (2)
    public String title;
    @ManyToOne
    @JsonbTransient
    public Author author;
    // Preexisting equals()/hashCode() methods
}

1	First, let’s use the `@Indexed` annotation to register our `Book` entity as part of the full text index.
2	The `@FullTextField` annotation declares a field in the index specifically tailored for full text search. In particular, we have to define an analyzer to split and analyze the tokens (~ words) - more on this later.

Now that our books are indexed, we can do the same for the authors.

Open the Author class and include the content below.

Things are quite similar here: we use the @Indexed, @FullTextField and @KeywordField annotations.

There are a few differences/additions though. Let’s check them out.

package org.acme.hibernate.search.elasticsearch.model;
import java.util.List;
import java.util.Objects;
import javax.persistence.CascadeType;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.OneToMany;
import org.hibernate.search.engine.backend.types.Sortable;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.FullTextField;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.Indexed;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.IndexedEmbedded;
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.KeywordField;
import io.quarkus.hibernate.orm.panache.PanacheEntity;
@Entity
@Indexed
public class Author extends PanacheEntity {
    @FullTextField(analyzer = "name") (1)
    @KeywordField(name = "firstName_sort", sortable = Sortable.YES, normalizer = "sort") (2)
    public String firstName;
    @FullTextField(analyzer = "name")
    @KeywordField(name = "lastName_sort", sortable = Sortable.YES, normalizer = "sort")
    public String lastName;
    @OneToMany(mappedBy = "author", cascade = CascadeType.ALL, orphanRemoval = true, fetch = FetchType.EAGER)
    @IndexedEmbedded (3)
    public List<Book> books;
    // Preexisting equals()/hashCode() methods
}

1	We use a `@FullTextField` similar to what we did for `Book` but you’ll notice that the analyzer is different - more on this later.
2	As you can see, we can define several fields for the same property. Here, we define a `@KeywordField` with a specific name. The main difference is that a keyword field is not tokenized (the string is kept as one single token) but can be normalized (i.e. filtered) - more on this later. This field is marked as sortable as our intention is to use it for sorting our authors.
3	The purpose of `@IndexedEmbedded` is to include the `Book` fields into the `Author` index. In this case, we just use the default configuration: all the fields of the associated `Book` entities are included in the index (i.e. the `title` field). The nice thing with `@IndexedEmbedded` is that it is able to automatically reindex an `Author` if one of its `Book`s has been updated thanks to the bidirectional relation. `@IndexedEmbedded` also supports nested documents (using the `storage = NESTED` attribute) but we don’t need it here. You can also specify the fields you want to include in your parent index using the `includePaths` attribute if you don’t want them all.

Analyzers and normalizers

Introduction

Analysis is a big part of full text search: it defines how text will be processed when indexing or building search queries.

The role of analyzers is to split the text into tokens (~ words) and filter them (making it all lowercase and removing accents for instance).

Normalizers are a special type of analyzers that keeps the input as a single token. It is especially useful for sorting or indexing keywords.

There are a lot of bundled analyzers but you can also develop your own for your own specific purposes.

You can learn more about the Elasticsearch analysis framework in the Analysis section of the Elasticsearch documentation.

Defining the analyzers used

When we added the Hibernate Search annotations to our entities, we defined the analyzers and normalizers used. Typically:

@FullTextField(analyzer = "english")

@FullTextField(analyzer = "name")

@KeywordField(name = "lastName_sort", sortable = Sortable.YES, normalizer = "sort")

We use:

an analyzer called name for person names,
an analyzer called english for book titles,
a normalizer called sort for our sort fields

but we haven’t set them up yet.

Let’s see how you can do it with Hibernate Search.

Setting up the analyzers

It is an easy task, we just need to create an implementation of ElasticsearchAnalysisConfigurer (and configure Quarkus to use it, more on that later).

To fulfill our requirements, let’s create the following implementation:

package org.acme.hibernate.search.elasticsearch.config;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurationContext;
import org.hibernate.search.backend.elasticsearch.analysis.ElasticsearchAnalysisConfigurer;
public class AnalysisConfigurer implements ElasticsearchAnalysisConfigurer {
    @Override
    public void configure(ElasticsearchAnalysisConfigurationContext context) {
        context.analyzer("name").custom() (1)
                .tokenizer("standard")
                .tokenFilters("asciifolding", "lowercase");
        context.analyzer("english").custom() (2)
                .tokenizer("standard")
                .tokenFilters("asciifolding", "lowercase", "porter_stem");
        context.normalizer("sort").custom() (3)
                .tokenFilters("asciifolding", "lowercase");
    }
}

1	This is a simple analyzer separating the words on spaces, removing any non-ASCII characters by its ASCII counterpart (and thus removing accents) and putting everything in lowercase. It is used in our examples for the author’s names.
2	We are a bit more aggressive with this one and we include some stemming: we will be able to search for `mystery` and get a result even if the indexed input contains `mysteries`. It is definitely too aggressive for person names but it is perfect for the book titles.
3	Here is the normalizer used for sorting. Very similar to our first analyzer, except we don’t tokenize the words as we want one and only one token.

Adding full text capabilities to our REST service

In our existing LibraryResource, we just need to inject the following methods (and add a few imports):

    @Transactional (1)
    void onStart(@Observes StartupEvent ev) throws InterruptedException { (2)
        // only reindex if we imported some content
        if (Book.count() > 0) {
            Search.session(em)
                    .massIndexer()
                    .startAndWait();
        }
    }
    @GET
    @Path("author/search") (3)
    @Transactional
    public List<Author> searchAuthors(@QueryParam String pattern, (4)
            @QueryParam Optional<Integer> size) {
        return Search.session(em) (5)
                .search(Author.class) (6)
                .where(f ->
                    pattern == null || pattern.trim().isEmpty() ?
                            f.matchAll() : (7)
                            f.simpleQueryString()
                                .fields("firstName", "lastName", "books.title").matching(pattern) (8)
                )
                .sort(f -> f.field("lastName_sort").then().field("firstName_sort")) (9)
                .fetchHits(size.orElse(20)); (10)
    }

1	Important point: we need a transactional context for these methods.
2	As we will import data into the PostgreSQL database using an SQL script, we need to reindex the data at startup. For this, we use Hibernate Search’s mass indexer, which allows to index a lot of data efficiently (you can fine tune it for better performances). All the upcoming updates coming through Hibernate ORM operations will be synchronized automatically to the full text index. If you don’t import data manually in the database, you don’t need that: the mass indexer should then only be used when you change your indexing configuration (adding a new field, changing an analyzer’s configuration…) and you want the new configuration to be applied to your existing entities.
3	This is where the magic begins: just adding the annotations to our entities makes them available for full text search: we can now query the index using the Hibernate Search DSL.
4	Use the `org.jboss.resteasy.annotations.jaxrs.QueryParam` annotation type to avoid repeating the parameter name.
5	First, we get an Hibernate Search session from the injected entity manager.
6	We indicate that we are searching for `Author`s.
7	We create a predicate: if the pattern is empty, we use a `matchAll()` predicate.
8	If we have a valid pattern, we create a `simpleQueryString()` predicate on the `firstName`, `lastName` and `books.title` fields matching our pattern.
9	We define the sort order of our results. Here we sort by last name, then by first name. Note that we use the specific fields we created for sorting.
10	Fetch the `size` top hits, `20` by default. Obviously, paging is also supported.

The Hibernate Search DSL supports a significant subset of the Elasticsearch predicates (match, range, nested, phrase, spatial…). Feel free to explore the DSL using autocompletion.

When that’s not enough, you can always fall back to defining a predicate using JSON directly.

Configuring the application

As usual, we can configure everything in the Quarkus configuration file, application.properties.

Edit src/main/resources/application.properties and inject the following configuration:

quarkus.ssl.native=false (1)
quarkus.datasource.db-kind=postgresql (2)
quarkus.datasource.username=quarkus_test
quarkus.datasource.password=quarkus_test
quarkus.datasource.jdbc.url=jdbc:postgresql:quarkus_test
quarkus.hibernate-orm.database.generation=drop-and-create (3)
quarkus.hibernate-orm.sql-load-script=import.sql (4)
quarkus.hibernate-search.elasticsearch.version=7 (5)
quarkus.hibernate-search.elasticsearch.analysis.configurer=org.acme.hibernate.search.elasticsearch.config.AnalysisConfigurer (6)
quarkus.hibernate-search.schema-management.strategy=drop-and-create (7)
quarkus.hibernate-search.elasticsearch.index-defaults.schema-management.required-status=yellow (8)
quarkus.hibernate-search.automatic-indexing.synchronization.strategy=sync (9)

1	We won’t use SSL so we disable it to have a more compact native executable.
2	Let’s create a PostgreSQL datasource.
3	We will drop and recreate the schema every time we start the application.
4	We load some initial data.
5	We need to tell Hibernate Search about the version of Elasticsearch we will use. It is important because there are significant differences between Elasticsearch mapping syntax depending on the version. Since the mapping is created at build time to reduce startup time, Hibernate Search cannot connect to the cluster to automatically detect the version.
6	We point to the custom `AnalysisConfigurer` which defines the configuration of our analyzers and normalizers.
7	Obviously, this is not for production: we drop and recreate the index every time we start the application.
8	We consider the `yellow` status is sufficient to proceed after an index is created. This is for testing purposes with the Elasticsearch Docker container. It should not be used in production.
9	This means that we wait for the entities to be searchable before considering a write complete. On a production setup, the `write-sync` default will provide better performance. Using `sync` is especially important when testing as you need the entities to be searchable immediately.

For more information about the Hibernate Search extension configuration please refer to the Configuration Reference.

Creating a frontend

Now let’s add a simple web page to interact with our LibraryResource. Quarkus automatically serves static resources located under the META-INF/resources directory. In the src/main/resources/META-INF/resources directory, overwrite the existing index.html file with the content from this index.html file.

Automatic import script

For the purpose of this demonstration, let’s import an initial dataset.

Let’s create a src/main/resources/import.sql file with the following content:

INSERT INTO author(id, firstname, lastname) VALUES (nextval('hibernate_sequence'), 'John', 'Irving');
INSERT INTO author(id, firstname, lastname) VALUES (nextval('hibernate_sequence'), 'Paul', 'Auster');
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'The World According to Garp', 1);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'The Hotel New Hampshire', 1);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'The Cider House Rules', 1);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'A Prayer for Owen Meany', 1);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'Last Night in Twisted River', 1);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'In One Person', 1);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'Avenue of Mysteries', 1);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'The New York Trilogy', 2);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'Mr. Vertigo', 2);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'The Brooklyn Follies', 2);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'Invisible', 2);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), 'Sunset Park', 2);
INSERT INTO book(id, title, author_id) VALUES (nextval('hibernate_sequence'), '4 3 2 1', 2);

Preparing the infrastructure

We need a PostgreSQL instance and an Elasticsearch cluster.

Let’s use Docker to start one of each:

docker run -it --rm=true --name elasticsearch_quarkus_test -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch-oss:{elasticsearch-version}

docker run --ulimit memlock=-1:-1 -it --rm=true --memory-swappiness=0 --name postgresql_quarkus_test -e POSTGRES_USER=quarkus_test -e POSTGRES_PASSWORD=quarkus_test -e POSTGRES_DB=quarkus_test -p 5432:5432 postgres:11.3

Time to play with your application

You can now interact with your REST service:

start Quarkus with ./mvnw compile quarkus:dev
open a browser to [http://localhost:8080/](http://localhost:8080/)
search for authors or book titles (we initialized some data for you)
create new authors and books and search for them too

As you can see, all your updates are automatically synchronized to the Elasticsearch cluster.

Building a native executable

You can build a native executable with the usual command ./mvnw package -Pnative.

As usual with native executable compilation, this operation consumes a lot of memory.

It might be safer to stop the two containers while you are building the native executable and start them again once you are done.

Running it is as simple as executing ./target/hibernate-search-elasticsearch-quickstart-1.0-SNAPSHOT-runner.

You can then point your browser to [http://localhost:8080/](http://localhost:8080/) and use your application.

The startup is a bit slower than usual: it is mostly due to us dropping and recreating the database schema and the Elasticsearch mapping every time at startup. We also inject some data and execute the mass indexer.

In a real life application, it is obviously something you won’t do at startup.

FAQ

Why Hibernate Search 6 (and not a fully supported version)?

To optimize the Hibernate Search bootstrap for Quarkus, the Hibernate team had to reorganize things a bit (collect the metadata offline, start the Elasticsearch client later…).

This couldn’t be done in the 5.x code base so we decided to go with the in-progress Hibernate Search 6.

Can I really use it?

While Hibernate Search 6 is still at Beta stage, the code is of production quality and can be relied on.

What we don’t guarantee is that there might be API changes along the way to the final release of Hibernate Search 6 and you might have to adapt your code.

If it is not a major issue for you, then sure you can use it.

Why Elasticsearch only?

Hibernate Search supports both a Lucene backend and an Elasticsearch backend.

In the context of Quarkus and to build microservices, we thought the latter would make more sense. Thus we focused our efforts on it.

We don’t have plans to support the Lucene backend in Quarkus for now.

Hibernate Search Configuration Reference

About the Duration format

The format for durations uses the standard java.time.Duration format. You can learn more about it in the Duration#parse() javadoc.

You can also provide duration values starting with a number. In this case, if the value consists only of a number, the converter treats the value as seconds. Otherwise, PT is implicitly prepended to the value to obtain a standard java.time.Duration format.

Hibernate Search + Elasticsearch