Adding search to Java application

Netz00
8 min readSep 22, 2022

--

Winter is coming! Yay!

This article demonstrates integration of full-text search with Hibernate Search into application using Java 8+ and Hibernate ORM for storing data in Relational DB. Consider this article as a trailer and huge simplification of Hibernate Search, but with some more complex use cases inside example project repository, such as using custom analyzers, edgeNgram, larger projection. If you are not going to use it I would recommend you to read only the first chapter. Simple examples are skipped because they already can be found at Hibernate documentation.

Hibernate Search 6.1.7.Final documentation is used as main source for this article; for more info please checkout official documentation.

Hibernate Search

Adding full-text search to your application is an advanced topic, so Hibernate implementation comes handy. Hibernate Search provides “simple” solution which requires a minimum amount of configuration to achieve complex full-text search capabilities.

Blue part at following illustration presents application which is using ORM to work with some DB, with requirement for full-text queries for some of data stored inside DB. The red part is missing infrastructure to have full-text queries capability.

Full-text queries can be processed by adding a search engine next to a DB. Hibernate Search supports both Elasticsearch and Lucene search engines.

After adding search engine redundancy is introduced, because data in DB is also stored at the Elasticsearch. With redundancy comes the first problem for Hibernate Search to solve, synchronization.

Hibernate Search offers several solutions for sync; the default is automatic sync. Automatic sync simply repeats all modifications over data inside DB to search engine database and keeps data at sync. So after creating, updating, deleting data in DB which is also stored at search engine, actions are repeated, and search engine data gets in sync. Depending on the use case, sometimes it is better to turn off automatic sync and synchronize once a day, because such automated sync could be overkill in some scenarios.
After solving the sync problem, how to control which data should be indexed inside Elasticsearch? Hibernate Search offers several annotations which can be added to existing Entities to configure indexing. These annotations will be explained later.

After data is stored and in sync, searching comes along. There are two ways to search data:

  1. Send a query to Elasticsearch and retrieve only indexes and then return the data from DB.
  2. Send a query to Elasticsearch and retrieve data directly without asking DB (requires using projections)

The complete explanation can be found in the following video.

Hibernate Search 6 Preview

Spring boot app with missing full-text search

If you already have an application with missing full-text search functionality, feel free to skip this section and jump right onto Adding Hibernate search.
For demonstration purposes, Spring Boot application, Elasticsearch integration with Docker containers and Docker Compose for orchestration will be used and all code can be found at the following repository. Spring Boot framework is optional; it could be any other too.

Application tries to show how Hibernate Search fits inside entire classic Spring Boot application, from controllers to the Search Engine and all way back with some realistic complex situations.

Adding full-text search

Configuring Elasticsearch container

Elasticsearch Docker Compose configuration

For development purposes, the Elasticsearch cluster is configured to be a single node living on the same server as the application. In order to work with Elasticsearch https://elasticvue.com/ can be used.

Also, it is good practice to secure Elasticsearch by enabling security and providing password for default user “elastic”. Advanced security options are not included in the free Security functionality. The last option limits RAM usage for Elasticsearch, so Elasticsearch won’t consume 20 GB of RAM.

Adding Hibernate search dependencies

pom.xml

In order to add Hibernate Search to an application, these two dependencies are required inside your POM; The latest versions are provided in the official documentation.

Configuring Hibernate Search

Configuring Hibernate Search can be done in multiple ways, in the following image Spring Boot configuration file is used (application.yml). Configuration is passed to Hibernate ORM and then to Hibernate Search.

application.yml

A single backend configuration is used (there can be multiple backends), with the address and credentials to access Elasticsearch container. Also, the schema management strategy is set to drop-and-create, which stands for “A strategy that drops existing indexes and re-creates them and their schema on startup”.
More properties and explanations can be found here.

Indexing entities

After configuring Hibernate Search, before searching itself, it is required to index some entities.
To index an entity, a class needs to be annotated with the @Indexed(index = “index_name”) annotation. Following annotation will create an empty index inside Elasticsearch with name idx_comment.

In order to map entity properties into index fields they also need to be annotated. Multiple annotations on same entity property are allowed. Following entity properties annotations will be explained @FullTextField, @KeywordField, @GenericField, @IndexedEmbedded.

@FullTextField annotation works only with String and configures field as text. Text will be analyzed before indexing or searching. Analyzers consists of tokenizer and filters. Tokenizer splits the string to substring which are then processed by filters. That means before indexing, string “Thinking in Java” will be tokenized to [“Thinking”, “in”, “Java”] and then several filters can be applied, such as lowercase all chars or remove stop words… Then while searching “same steps” will be repeated on query. It is possible to configure different analyzers for indexing and for searching through configuration. Finally if user searched for “Learning Java” it will be tokenized to [“Learning”, “Java”] and “Java” will match stored “Java” (Thinking in Java) which will be considered as match and “Thinking in Java” will be returned as result! Text fields can’t be sorted but the following annotation solves that problem (keyword).

It is possible to make custom analyzers combining specific tokenizer and filters. Except whitespace tokenizer and lowercase filter there are many others available here.

Analyzer flow

The @KeywordField annotation works only with String and configures the field as a keyword. On keyword fields only normalizers can be applied (no analyzers). Normalizers are similar to analyzers but without tokenizing. That means before indexing, the string “Thinking in Java” can only be normalized and will be stored as a single keyword. Also, while searching, the term will be also normalized and the previous example wouldn’t match. This type is useful for sorting operation. Also we can combine keyword and fulltext field on same field.

@GenericField “A good default choice that will work for every property type with built-in support.” In the example it is used for Date and Long (primary key).

@IndexedEmbedded allows mapping associated elements. It is used to perform a search over nested object fields. For example, if there is entity Student which has @ManyToMany association with Course, it is possible to search Students by Course name. Using other types of associations is also possible (@OneToOne, @OneToMany). It is not required to use @Indexed on a nested object if it is not indexed by itself, as shown in the following example.

@Indexed(index = "idx_student")
public class Student {
...
@ManyToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY)
@JoinTable(
name = "student_courses",
joinColumns = @JoinColumn(name = "student_id"),
inverseJoinColumns = @JoinColumn(name = "course_id"))
@IndexedEmbedded(name = "courses", includePaths = {"name"})
private Set<Course> courses = new HashSet<>();
...
}
public class Course {
...
@Column(name = "name")
@KeywordField(name = "name", normalizer = "lowercase", projectable = Projectable.YES)
private String name;
...
}

Also Hibernate detects should entity be re-indexed at field level. So if we are updating previous Student non-indexed fields, it won’t be indexed. So choose wisely what should be indexed.

More annotations and explanations can be found here.

MassIndexer

Sometimes DB and Elasticsearch can get out of sync, in edge scenarios when some I/O exception occurs after data is stored in database. One solution, also used in the example project is complete reindexing with MassIndexer. Inside example project is done through a scheduled job which calls MassIndexer.

The example project is far away from causing out-of sync issues and it serves only when indexes had to be wiped, for example because the Hibernate Search mapping or some core settings changed which is useful for development. So mass indexers re-populates indexes with data from database. Otherwise, automatic indexing is used.

Searching

There are 2 ways of fetching search results demonstrated in example.

Fetching data directly from Elasticsearch by using projections and skipping DB. Faster, but the field value is stored in the index by using projectable = Projectable.YES property while annotating fields. And more complex implementation. Also, “RESULTS” are not clean entities; they are String, Long, Date or their combination inside List… which then need to be combined into some sort of response. In an example project they are processed into data transfer objects (DTOs) and returned as DTOs with some null value fields. For some scenarios like search as you type, this solution could be optimal.

Projections

Another way is using Search Engine only for finding which entities, not actual values, then fetching entities from DB. This way there is the database round-trip, which slows the process. But in some scenarios it can fit better than the previous. Search engines are optimized for searching, not for updating, and indexing only required fields, and letting the database handle the rest can improve speed more than the database round-trip cost.

Classic search

As next steps I would recommend reading Hibernate Search documentation and then implementing search engine. Also checkout my project for working example.

THANK YOU

--

--