Querying Cache in Apache Ignite
Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at the petabyte scale.
Ignite provides an in-memory data store where each node in an Ignite cluster by default stores data in RAM. The data is kept in off-heap storage to ensure low latency and consistent access times. The system is multi-model, with the ability to support structured, semi-structured, and unstructured data.
Apache Ignite offers an elegant query API with the following components:
- Predicate-based Scan Query
A scan query is a simple search query used to retrieve data from a cache in a distributed manner. When executed without parameters, a scan query returns all entries from the cache. Scan queries return entries that match a predicate, if specified. The predicate is applied to the remote nodes. - ANSI 99-compliant SQL Query
Apache Ignite SQL Grid is a distributed data grid where you can execute ANSI SQL-99-compliant SQLs (SELECT, UPDATE, INSERT, MERGE, and DELETE queries) to manipulate a cache. - Lucene Index-based Text Query
Lucene is a full-text search library in Java which makes it easy to add search functionality to an application or website. Ignite supports full-text queries based on the Apache Lucene engine. - Continuous query
Continuous queries allow registering a remote filter and a local listener for cache updates. Once the cache is updated and passes the filter criteria, then the cache update events will be sent to the node that executed the query, and the local listener will be notified. This helps us to take action in case the cache is updated on Ignite Cluster.
For SQL queries, Ignite supports in-memory indexing, so all the data lookups are extremely fast. If we are caching our data in off-heap memory, then query indexes will also be cached in off-heap memory as well.
In this tutorial, we will learn how to query the data stored in the Cache of Apache Ignite using query API. We will not cover Continuous Query in this tutorial.
We will use the following steps to explore Query-based search in Apache Ignite Cache:
- Integrating Apache Ignite Server Node With Spring Boot
- Enabling Caching
- Query and QueryCursor
- Scan Queries
- SQL Queries
- Text Queries
- Testing the Application
Prerequisites
For this tutorial, we are using JDK 1.8 and Spring Boot 2.5.6 project with the following dependencies:
spring-boot-starter
spring-boot-starter-web
spring-boot-starter-test
Here, we will use Maven for dependency management. Along with the above dependencies, we will also use springdoc-openapi to automatically generate the OpenAPI 3 specification docs for our API. For that, we simply need to add the springdoc-openapi-ui dependency to our pom.xml :
<dependency>
<groupId>org.springdoc</groupId>
<artifactId>springdoc-openapi-ui</artifactId>
<version>1.5.2</version>
</dependency>
1. Integrating Apache Ignite Server Node With Spring Boot
The Apache Ignite server nodes act as containers for data and computations. Once interconnected, the server nodes will represent a distributed database (or data grid) that stores the data, participates in queries processing, compute execution, stream processing, and so on.
Ignite provides an implementation of JCache (JSR 107)
specification. JCache provides a very simple and powerful API for data access. However, the specification omits any details about data distribution and consistency to allow vendors enough freedom for their own implementations. Here, we will use org.apache.ignite.cache.CacheManager to interact with the Cache.
We will use Spring XML-based configuration for Ignite.
The above codes create a CacheManager instance using the Ignite configuration XML file path specified in igniteConfigPath variable.
We also need to add the following Maven dependencies in our Spring Boot app to enable Ignite-based caching.
Here, we are using GridGain community edition version: 8.8.10. Once we are done with the above configuration, then we can start the Ignite node in embedded mode, that is, in the same JVM where the application is running.
2. Enabling Caching
After we have the Spring Boot app with embedded Ignite node configured, then we need to define our Cache. Here, we will define a sample Player cache, which we will use for querying purposes. We are using XML-based configuration to define our cache.
Once we declare our Player cache, then we have to define it.
The above code does the following:
We started off by defining the Player class with some attributes, which we have declared in our cache configuration. We have used the annotation @QuerySqlField on fields, which we will use for SQL queries. All fields which need to be involved in SQL clauses must have this annotation. We have also used the annotation @QueryTextField on fields that need to be indexed for full-text search using Lucene.
3. Query and QueryCursor
IgniteCache
has several query methods, all of which require some subclass of the Query
class. When we query IgniteCache
it returns the QueryCursor.Query
abstract class that represents an abstract paginated query to be executed on the distributed cache. We can also set the page size for the returned cursor via Query.setPageSize(...)
method (default is 1024
).
QueryCursor
class represents the query result set and allows for transparent page-by-page iteration. Whenever we start iterating over the last page, it will automatically request the next page in the background. For cases when pagination is not needed, we can use the QueryCursor.getAll()
method, which will fetch the whole query result and store it in a collection. If we use QueryCursor.getAll()
method, it will also automatically close the cursor in the background.
To explore the search queries we will create a class called PlayerService
. This will contain various methods required for the search operation.
4. Scan Queries
Apache Ignite’s key-value pair API is used to store objects in a cache and retrieve values using keys. It is a query API that lets us query objects using expressions. ScanQuery provides one of the implementations of Query API, which allows us to query the cache in distributed form using some user-defined predicate.
Let us explore the ScanQuery API now.
The above code does the following:
- We started off by creating a public method called scanQuerySearch in
line 1
with search text input, which we will use to search in Player cache. - In
line 3
, we retrieved an instance of the Player cache from Ignite node. - Then we created an instance of ScanQuery in
line 4
, which takesIgniteBiPredicate
as an argument.IgniteBiPredicate
is a functional interface that accepts two parameters and returns a boolean. It is usually used to filter a collection of objects and can be used in lambda expression also. Here, we are going to use a Java 8 lambda expression to representIgniteBiPredicate
. The k represents the key of the cache and v is the value of Player. Our IgniteBiPredicate returns True only if any player stored in the cache qualifies the expression player.getTeam() EQ input text. The result is returned as a QueryCursor. It stores all qualified entries (key-value pairs). - From
line 7
to14
we iterated through the results to locate if such an entry exists in the cache and then populated a local List of filtered Players. - In
line 15
we are returning the filtered list of Players to the caller.
The ScanQuery goes over each cache entry and applies the predicate, which may not always be the most efficient way to query objects, especially if the data size is large.
5. SQL Queries
The SQLQuery API supports ANSI-99 SQL queries against caches. This API allows SQL joins against collocated entries on the same node as well as non-collocated distributed nodes.
To tell Apache Ignite which fields are accessible for SQL queries, we need to define the metadata. Apache Ignite’s @QuerySqlField annotation does the trick. We can even index the fields for a faster query by setting the annotation value: @QuerySqlField(index=true). We have used this annotation in Step 2
for our Player cache to enable SQL queries.
Let us explore the SQLQuery API now.
The above code does the following:
- We started off by creating a public method called sqlQuerySearch in
line 1
with search text input, which we will use to search in Player cache. - In
line 3
, we retrieved an instance of the Player cache from Ignite node. - Next, we extracted the table name from the cache in
line 4
to5
. Once we have the table name, then we can form a SQL Query using the table name to search the cache. - We created an instance of
SqlFieldsQuery
inline 6
.
TheSqlFieldsQuery
class provides another implementation of Query API. It is used for executing SQL statements and navigating through the results. It accepts a standard SQL query as its constructor parameter and executes it.SqlFieldsQuery
is executed through theIgniteCache.query(SqlFieldsQuery)
method, which returns a QueryCursor. - From
line 8
to9
we iterated and extracted the results usinggetAll()
method. If an entry exists in the cache, then we collect the results and return the response to the caller.
In replicated mode, SQL joins execute fast as all entries are replicated in all cluster nodes. However, in partitioned mode, a few nodes may not contain the primary backup, hence, distributed joins become complex and expensive. If we are planning to use distributed joins on partitioned tables then we should use Colocated joins or Hash joins.
6. Text Queries
The TextQuery API allows us to run a full-text search on stored objects in the cache. The TextQuery works on Lucene indexes. Elasticsearch and Apache Solr also use Lucene for indexing text.
Here too, we need to define the metadata to tell Apache Ignite which fields are to be enabled for Text search. The @QueryTextField annotation enables indexing along with Text Search. Also, our cache configuration needs to enable indexing by setting the setIndexedTypes. We have already set this in Step 2
during our cache configuration.
Let us explore the TextQuery API now.
The above code does the following:
- In
line 1
we created a method called textSearch, which takes a text as an input and then searches that text in Player cache. It searches the fields in Player cache annotated with QueryTextField. - Then in
line 3
, we retrieved an instance of the Player cache from Ignite node. - We created an instance of
TextQuery
inline 4
. Once we have an instance ofTextQuery
andIgniteCache
we called the method searchPlayers to search the text in the cache. - From
line 25
to29
we iterated through the results to locate if such an entry exists in the cache and then populated a local List of filtered Players. - In
line 30
we are returning the filtered list of Players to the caller. - Similar to textSearch we also created a method called fuzzySearch from
line 8
to13
to perform Fuzzy search on Player cache. Fuzzy search determines whether there is any similarity between elements of the data. Let’s say we want to fetch players where the team name is ‘Barcelona’ or ‘Barcenola’. Then we just need to add a ‘~’ at the end of our search string and pass it to the TextQuery constructor. We are going to find the players where the text ‘Barcenola’ is present, and the fuzzy search will find a match between nola and lona and return all ‘Barcelona’ players. - We can also perform Fuzzy search on specific fields. For this reason we added another method called fuzzySearchOnSpecificField from
line 15
to20
which takes fieldName and text as an input. - In the previous methods textSearch and fuzzySearch, during search operation Ignite looked at all the indexes, name and team, to find data. However, we can also ask Ignite to look into specific index. We can see that in
line 18
. In this method we ask Ignite to look at a specific index. It will search the input text only in the field provided by user. For example, if we are interested in name = ‘Neymar’, then the TextQuery can be configured as name:”Neymar”.
It is advisable to use indexes for querying entries. However, one drawback is that an index itself takes up space and may slow down the data modification as every time we modify an entry, the index needs to be rebuilt.
7. Testing the Application
Till now we have written most of the code required to perform various kinds of searches in Ignite Cache. Now let us write some code to populate the cache at startup . The PlayerService
class contains various methods required to explore Query API in Ignite. Let us also add the below method to populate the cache.
The above code populates the Player Cache with Player name and Team. We can use these values for testing purposes.
After we have populated our cache with some sample values, then we added a Controller to enable users to perform various search operations.
The above code does the following:
It exposes API which can be used to test the various features of Ignite Query API that we discussed so far. It uses the PlayerService
class to perform the search operations.
Once we build the project using mvn clean package
then we can start the application using java -jar target/ignite-poc*.jar
command.
After the application is started successfully, then we can use the Swagger-UI to test the application. The Swagger-UI comes up at http://localhost:9051/swagger-ui.html in local.
7.1 Search Player Cache with team name as input using SQLQuery API
Here, we are using SQLQuery to search all the Players whose team contains the given text : “Barcelona”
7.2 Search Player Cache with team name or player name that matches the text input using TextQuery API
Here, we are using TextQuery to search all the Players where either name or team contains the given text: “United”
7.3 Search Player Cache with team name that matches the text input using ScanQuery API
Here, we are using ScanQuery to search all the Players whose team contains the given text: “Barcelona”
7.4 Search Player Cache with team name or player name that matches the text input using TextQuery with Fuzzy search enabled
Here, we are using using Fuzzy TextQuery to search all the Players where either name or team contains the given text: ”Barcenona”
7.5 Search Player Cache with input field name and text using TextQuery API
Here, we are using Fuzzy TextQuery to search all Players with fieldName where fieldName contains the given text: “Messi”
If you would like to refer to the full code, do check:
Conclusion
To perform search operation on Key-Value pair, Cache is a great feature provided by Ignite. If Query API is used properly then the performance is also at par with key-value pair search. If there is a need for key-value search along with SQL Query support then Apache Ignite is one of the best choices currently available.