Querying Cache in Apache Ignite

Suman Das
Crux Intelligence
Published in
11 min readDec 13, 2021
Cache Queries

Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at the petabyte scale.

Ignite provides an in-memory data store where each node in an Ignite cluster by default stores data in RAM. The data is kept in off-heap storage to ensure low latency and consistent access times. The system is multi-model, with the ability to support structured, semi-structured, and unstructured data.

Apache Ignite offers an elegant query API with the following components:

  • Predicate-based Scan Query
    A scan query is a simple search query used to retrieve data from a cache in a distributed manner. When executed without parameters, a scan query returns all entries from the cache. Scan queries return entries that match a predicate, if specified. The predicate is applied to the remote nodes.
  • ANSI 99-compliant SQL Query
    Apache Ignite SQL Grid is a distributed data grid where you can execute ANSI SQL-99-compliant SQLs (SELECT, UPDATE, INSERT, MERGE, and DELETE queries) to manipulate a cache.
  • Lucene Index-based Text Query
    Lucene is a full-text search library in Java which makes it easy to add search functionality to an application or website. Ignite supports full-text queries based on the Apache Lucene engine.
  • Continuous query
    Continuous queries allow registering a remote filter and a local listener for cache updates. Once the cache is updated and passes the filter criteria, then the cache update events will be sent to the node that executed the query, and the local listener will be notified. This helps us to take action in case the cache is updated on Ignite Cluster.

For SQL queries, Ignite supports in-memory indexing, so all the data lookups are extremely fast. If we are caching our data in off-heap memory, then query indexes will also be cached in off-heap memory as well.

In this tutorial, we will learn how to query the data stored in the Cache of Apache Ignite using query API. We will not cover Continuous Query in this tutorial.

We will use the following steps to explore Query-based search in Apache Ignite Cache:

  1. Integrating Apache Ignite Server Node With Spring Boot
  2. Enabling Caching
  3. Query and QueryCursor
  4. Scan Queries
  5. SQL Queries
  6. Text Queries
  7. Testing the Application

Prerequisites

For this tutorial, we are using JDK 1.8 and Spring Boot 2.5.6 project with the following dependencies:

spring-boot-starter
spring-boot-starter-web
spring-boot-starter-test

Here, we will use Maven for dependency management. Along with the above dependencies, we will also use springdoc-openapi to automatically generate the OpenAPI 3 specification docs for our API. For that, we simply need to add the springdoc-openapi-ui dependency to our pom.xml :

<dependency>     
<groupId>org.springdoc</groupId>
<artifactId>springdoc-openapi-ui</artifactId>
<version>1.5.2</version>
</dependency>

1. Integrating Apache Ignite Server Node With Spring Boot

The Apache Ignite server nodes act as containers for data and computations. Once interconnected, the server nodes will represent a distributed database (or data grid) that stores the data, participates in queries processing, compute execution, stream processing, and so on.

Ignite provides an implementation of JCache (JSR 107) specification. JCache provides a very simple and powerful API for data access. However, the specification omits any details about data distribution and consistency to allow vendors enough freedom for their own implementations. Here, we will use org.apache.ignite.cache.CacheManager to interact with the Cache.

We will use Spring XML-based configuration for Ignite.

IgniteCacheManagerConfiguration

The above codes create a CacheManager instance using the Ignite configuration XML file path specified in igniteConfigPath variable.

We also need to add the following Maven dependencies in our Spring Boot app to enable Ignite-based caching.

ignite-configuration.xml

Here, we are using GridGain community edition version: 8.8.10. Once we are done with the above configuration, then we can start the Ignite node in embedded mode, that is, in the same JVM where the application is running.

2. Enabling Caching

After we have the Spring Boot app with embedded Ignite node configured, then we need to define our Cache. Here, we will define a sample Player cache, which we will use for querying purposes. We are using XML-based configuration to define our cache.

ignite-cache-configuration.xml

Once we declare our Player cache, then we have to define it.

Player-Cache

The above code does the following:

We started off by defining the Player class with some attributes, which we have declared in our cache configuration. We have used the annotation @QuerySqlField on fields, which we will use for SQL queries. All fields which need to be involved in SQL clauses must have this annotation. We have also used the annotation @QueryTextField on fields that need to be indexed for full-text search using Lucene.

3. Query and QueryCursor

IgniteCache has several query methods, all of which require some subclass of the Query class. When we query IgniteCache it returns the QueryCursor.Query abstract class that represents an abstract paginated query to be executed on the distributed cache. We can also set the page size for the returned cursor via Query.setPageSize(...) method (default is 1024).

QueryCursor class represents the query result set and allows for transparent page-by-page iteration. Whenever we start iterating over the last page, it will automatically request the next page in the background. For cases when pagination is not needed, we can use the QueryCursor.getAll() method, which will fetch the whole query result and store it in a collection. If we use QueryCursor.getAll() method, it will also automatically close the cursor in the background.

To explore the search queries we will create a class called PlayerService. This will contain various methods required for the search operation.

4. Scan Queries

Apache Ignite’s key-value pair API is used to store objects in a cache and retrieve values using keys. It is a query API that lets us query objects using expressions. ScanQuery provides one of the implementations of Query API, which allows us to query the cache in distributed form using some user-defined predicate.

Let us explore the ScanQuery API now.

ScanQuery

The above code does the following:

  • We started off by creating a public method called scanQuerySearch in line 1 with search text input, which we will use to search in Player cache.
  • In line 3, we retrieved an instance of the Player cache from Ignite node.
  • Then we created an instance of ScanQuery in line 4, which takes IgniteBiPredicate as an argument. IgniteBiPredicate is a functional interface that accepts two parameters and returns a boolean. It is usually used to filter a collection of objects and can be used in lambda expression also. Here, we are going to use a Java 8 lambda expression to represent IgniteBiPredicate. The k represents the key of the cache and v is the value of Player. Our IgniteBiPredicate returns True only if any player stored in the cache qualifies the expression player.getTeam() EQ input text. The result is returned as a QueryCursor. It stores all qualified entries (key-value pairs).
  • From line 7 to 14 we iterated through the results to locate if such an entry exists in the cache and then populated a local List of filtered Players.
  • In line 15 we are returning the filtered list of Players to the caller.

The ScanQuery goes over each cache entry and applies the predicate, which may not always be the most efficient way to query objects, especially if the data size is large.

5. SQL Queries

The SQLQuery API supports ANSI-99 SQL queries against caches. This API allows SQL joins against collocated entries on the same node as well as non-collocated distributed nodes.

To tell Apache Ignite which fields are accessible for SQL queries, we need to define the metadata. Apache Ignite’s @QuerySqlField annotation does the trick. We can even index the fields for a faster query by setting the annotation value: @QuerySqlField(index=true). We have used this annotation in Step 2 for our Player cache to enable SQL queries.

Let us explore the SQLQuery API now.

SqlQuery

The above code does the following:

  • We started off by creating a public method called sqlQuerySearch in line 1 with search text input, which we will use to search in Player cache.
  • In line 3, we retrieved an instance of the Player cache from Ignite node.
  • Next, we extracted the table name from the cache in line 4 to 5. Once we have the table name, then we can form a SQL Query using the table name to search the cache.
  • We created an instance of SqlFieldsQuery in line 6.
    The SqlFieldsQuery class provides another implementation of Query API. It is used for executing SQL statements and navigating through the results. It accepts a standard SQL query as its constructor​ parameter and executes it. SqlFieldsQuery is executed through the IgniteCache.query(SqlFieldsQuery) method, which returns a QueryCursor.
  • From line 8 to 9 we iterated and extracted the results using getAll() method. If an entry exists in the cache, then we collect the results and return the response to the caller.

In replicated mode, SQL joins execute fast as all entries are replicated in all cluster nodes. However, in partitioned mode, a few nodes may not contain the primary backup, hence, distributed joins become complex and expensive. If we are planning to use distributed joins on partitioned tables then we should use Colocated joins or Hash joins.

6. Text Queries

The TextQuery API allows us to run a full-text search on stored objects in the cache. The TextQuery works on Lucene indexes. Elasticsearch and Apache Solr also use Lucene for indexing text.

Here too, we need to define the metadata to tell Apache Ignite which fields are to be enabled for Text search. The @QueryTextField annotation enables indexing along with Text Search. Also, our cache configuration needs to enable indexing by setting the setIndexedTypes. We have already set this in Step 2 during our cache configuration.

Let us explore the TextQuery API now.

TextQuery

The above code does the following:

  • In line 1 we created a method called textSearch, which takes a text as an input and then searches that text in Player cache. It searches the fields in Player cache annotated with QueryTextField.
  • Then in line 3, we retrieved an instance of the Player cache from Ignite node.
  • We created an instance of TextQuery in line 4. Once we have an instance of TextQuery and IgniteCache we called the method searchPlayers to search the text in the cache.
  • From line 25 to 29 we iterated through the results to locate if such an entry exists in the cache and then populated a local List of filtered Players.
  • In line 30 we are returning the filtered list of Players to the caller.
  • Similar to textSearch we also created a method called fuzzySearch from line 8 to 13 to perform Fuzzy search on Player cache. Fuzzy search determines whether there is any similarity between elements of the data. Let’s say we want to fetch players where the team name is ‘Barcelona’ or ‘Barcenola’. Then we just need to add a ‘~’ at the end of our search string and pass it to the TextQuery constructor. We are going to find the players where the text ‘Barcenola’ is present, and the fuzzy search will find a match between nola and lona and return all ‘Barcelona’ players.
  • We can also perform Fuzzy search on specific fields. For this reason we added another method called fuzzySearchOnSpecificField from line 15 to 20 which takes fieldName and text as an input.
  • In the previous methods textSearch and fuzzySearch, during search operation Ignite looked at all the indexes, name and team, to find data. However, we can also ask Ignite to look into specific index. We can see that in line 18 . In this method we ask Ignite to look at a specific index. It will search the input text only in the field provided by user. For example, if we are interested in name = ‘Neymar’, then the TextQuery can be configured as name:”Neymar”.

It is advisable to use indexes for querying entries. However, one drawback is that an index itself takes up space and may slow down the data modification as every time we modify an entry, the index needs to be rebuilt.

7. Testing the Application

Till now we have written most of the code required to perform various kinds of searches in Ignite Cache. Now let us write some code to populate the cache at startup . The PlayerService class contains various methods required to explore Query API in Ignite. Let us also add the below method to populate the cache.

PopulateCache

The above code populates the Player Cache with Player name and Team. We can use these values for testing purposes.

After we have populated our cache with some sample values, then we added a Controller to enable users to perform various search operations.

PlayerController

The above code does the following:

It exposes API which can be used to test the various features of Ignite Query API that we discussed so far. It uses the PlayerService class to perform the search operations.

Once we build the project using mvn clean package then we can start the application using java -jar target/ignite-poc*.jar command.

After the application is started successfully, then we can use the Swagger-UI to test the application. The Swagger-UI comes up at http://localhost:9051/swagger-ui.html in local.

Swagger-UI

7.1 Search Player Cache with team name as input using SQLQuery API

Here, we are using SQLQuery to search all the Players whose team contains the given text : “Barcelona”

SQLQuery

7.2 Search Player Cache with team name or player name that matches the text input using TextQuery API

Here, we are using TextQuery to search all the Players where either name or team contains the given text: “United”

TextQuery

7.3 Search Player Cache with team name that matches the text input using ScanQuery API

Here, we are using ScanQuery to search all the Players whose team contains the given text: “Barcelona”

SsanQuery

7.4 Search Player Cache with team name or player name that matches the text input using TextQuery with Fuzzy search enabled

Here, we are using using Fuzzy TextQuery to search all the Players where either name or team contains the given text: ”Barcenona”

Fuzzy-Search

7.5 Search Player Cache with input field name and text using TextQuery API

Here, we are using Fuzzy TextQuery to search all Players with fieldName where fieldName contains the given text: “Messi”

TextQuery-with-field

If you would like to refer to the full code, do check:

Conclusion

To perform search operation on Key-Value pair, Cache is a great feature provided by Ignite. If Query API is used properly then the performance is also at par with key-value pair search. If there is a need for key-value search along with SQL Query support then Apache Ignite is one of the best choices currently available.

References & Useful Readings

--

--