How to use Apache Ignite as Cassandra cache layer

It was last year at the 2016 Cassandra’s summit in San Jose when I first heard of Apache Ignite:
Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
It has has plenty of features: I dare you to look at the project homepage and understand all of them at first glance! After some random reading you will probably step on the feature the was the reason why I met Ignite at the summit.
Ignite can act as a read-through and write-through (optionally, also write-behind) cache for Apache Cassandra.
I gave it a try some days after the summit without much success. Maybe, I did not try that hard: the integration with Cassandra was a relatively new feature, and I was doing this just out of curiosity. Fortunately a recent project I was assigned to gave me the opportunity to better evaluate it.
In this article I am going to describe how to set up Ignite to work as a cache layer for Cassandra. I am going to provide configuration examples of data mappings between the two systems, and some (very simple) code to access the data with the native api and the REST client. I recommend you to have a look at the documentation, which is quite exhaustive. Another great source of information which is worth mentioning is “High Performance in-memory computing with Apache Ignite” by Shamim Ahmed Bhuiyan, Michael Zheludkov and Timur Isachenko.
Before proceeding I would like to thank everyone involved in the project mailing list, which provided precious help in the process. The article has seen several refactoring (it was originally designed to include also ODBC and JDBC examples, later removed to keep it compact), therefore I apologize in advance for mistakes and discrepancies.
Article roadmap
- The use case
- Ignite installation
- Data model
- Examples
- Final considerations
The use case
I’ve been asked to design a RESTful api backed up by Cassandra, which requires also the use of several counters to track billing information user-wide. Being a write-optimized NoSQL store, Cassandra doesn’t like the read-before-write paradigm required to effectively use the counters the way I needed. More precisely, It’s actually considered an anti-pattern to read something before updating it. One of the solution is to use a cache layer to store the counters. What separates Ignite from the two most popular cache systems (Redis, Memcache to my knowledge) is the built-in integration with Cassandra as persistence store for the cached data. Also, a quite unique Ignite feature is the possibility to access cache stored data using SQL syntax.
Apart from this specific use case, you could want to use Ignite even if your data model does not force your Cassandra cluster to perform the way it was not designed to. Caching capabilities in Cassandra are implemented with the row cache, which is quite ineffective in most of the cases, and it is in fact limited to read-intensive workloads (95% of total operations are reads).
Installation
The examples provided in this article are tested using Ignite 2.0 (the latest at the time of writing), Java JDK 8, Ubuntu 16.04, Cassandra via docker image (the actual version is 3.9, but since no special feature has been used, one of the recent version will do the job).
There are basically two ways to use Apache Ignite:
- Download the binaries, configure it using Spring beans (xml files), and execute it from the shell;
- Integrate it in a maven project using the required dependencies, configure it via code, execute it as a Java app.
In this article I’ll be using the second option. In any case, it is important to know that Ignite is built in a modular fashion: each main functionality is implemented in its separate module, but keep in mind that not all modules are included as default! If you are using the binary distribution, you will find them in the $IGNITE_HOME/libs/optional folder. In order to activate a module, simply move the corresponding folder into the libs directory. Instead, as you will see, if you build Ignite inside your own maven project, simply add the associated module dependency to the pom.
First thing to do is set up a new maven project in your favorite IDE, and add the required maven dependencies. Notice that we added also a plugin which is required to produce a single executable jar:
Next step is to create a dedicated keyspace in our Cassandra container that will hold the tables we are going to need to persist Ignite’s data:
root@riccardo-U36SG:/home/riccardo# docker exec -it cass-3.9–0 cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> create keyspace ignite with replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’: 1};Notice that we set a replication_factor of one since we are going to work with a one-node Cassandra cluster (does it make sense to still call it a cluster?).
Finally, let’s write some code! We will configure Ignite to work with Cassandra as persistent store and use it through all the examples:
The code is easily readable. Configuration settings like Cassandra seeds, and the persistence settings file path are stored in a property file which is read at the very beginning of the program. During the examples we are going to change the ignite.persisteceSettings property value to meet different persistence configurations.
# Cassandra contact points
cassandra.contactPoints=172.17.0.2
# Persistence settings xml path
ignite.persistenceSettings=./configuration/persistence_settings.xmlOne aspect which is worth mentioning is that we configured Ignite to work in write-behind mode: normally, a cache write involves putting data in memory, and writing the same into the persistence source, so there will be 1-to-1 mapping between cache writes and persistence writes. With the write-behind mode, Ignite instead will batch the writes and execute them regularly at the specified frequency. This is aimed at limiting the amount of communication overhead between Ignite and the persistent store, and really makes a lot of sense if the data being written rapidly changes.
Data Model
Data stored into Apache Ignite is in the form of key-value pairs. Therefore, since we are going to persist that data into Cassandra, we need to define how the key and value entities will be mapped into Cassandra table columns. We have several options:
- Primitive strategy: primitive Java types used for our key-value pairs will be mapped to Cassandra’s native types;
- BLOB: store values in an opaque fashion into Cassandra using BLOB fields;
- POJO: define custom Java Classes to be used as keys or values for the data we are going to cache.
I am going to provide two examples showing the POJO and primitive strategy. Data will be accessed using the Ignite REST client, and the Native API.
Example 1: primitive strategy, REST client
The first example will be also the simplest one: we’ll be using the primitive strategy with our keys and values being of type String. Hence we are going first to define the Cassandra table (using pretty creative column names) that will persist them:
cqlsh> create table ignite.primitive_rest(key text primary key, value text);The persistence settings will be loaded by Ignite from an xml file:
The keyspace and table names can be configured in the root element attributes. Ignite will create the table if not present in the keyspace. More complex mappings are of course supported as we will see. I encourage you to visit the dedicated section of the Ignite’s documentation, in particular the one with example xmls, containing a complete description of the available options in the comments. However, one huge drawback of accessing Ignite using the REST API is that we will be limited only to String types.
Now we can start Ignite (directly from the IDE, or building the final JAR), and try to interact with the cache using simple REST calls (off-topic: postman is a GREAT tool to help you manage your APIs and API calls in general):
Put an entry in the cache
http://localhost:8080/ignite?cmd=put&key=keytest&val=valuetest&cacheName=ignite-cassandra-testResponse
{
“successStatus”: 0,
“affinityNodeId”: “51eb4d99–609a-4689-a031–48c5e04fade6”,
“sessionToken”: null,
“response”: true,
“error”: null
}Get an entry from the cache
http://localhost:8080/ignite?cmd=get&key=keytest&cacheName=ignite-cassandra-testResponse:
{
“successStatus”: 0,
“affinityNodeId”: “51eb4d99–609a-4689-a031–48c5e04fade6”,
“sessionToken”: null,
“response”: “valuetest”,
“error”: null
}Finally, let’s query Cassandra to assure that cache entries are being persisted into the db:
cqlsh> select * from ignite.primitive_rest;key | value
---------+------------
keytest | valuetest(1 rows)
Example 2: POJO strategy, Native API
This example will be slightly more complex, but not that much. It will involve coding the native API client, the POJO class and some more configuration.
POJO Class: CustomCounter
Let’s create the Java class that will model our cache value entries in Ignite. In the use case mentioned earlier the requirement was to store counters, so we will configure a simple class called CustomCounter having two int fields called counterOne and counterTwo that will be our counters associated to the particular user. It’s required for our class to implement the Serializable interface. Getters and setters methods will be needed too. Cache keys will be simple Strings, representing the usernames to associate the counters to.
Persistence configuration
We will configure the CustomCounter class to be also the structure of our values in the persistent store (Cassandra) using the POJO strategy. We will use instead the primitive strategy to map Strings to cache keys.
Persistence configuration can be achieved using the following xml:
Having specified the class CustomCounter as value type, Ignite will search for Cassandra columns matching the lowercase attribute names defined in the class. You can also specify the mappings one-by-one. Again, a great source of information regarding the configuration options can be found in the configuration examples.
We are now going to create the associated Cassandra table:
cqlsh:ignite> create table custom_counter(
… username text primary key,
… counterone int,
… countertwo int);Ignite client
The final piece of the puzzle is the Ignite Client we are going to use to interact with the cache:
Ignite implements a discover mechanism using multicast-ip which allows for zero-configuration cluster to be deployed immediately in the same network. If you started the Ignite server and the Ignite Client, you should be able to see the client count and server count jump to one in the log.
Now let’s check the persisted data into the associated Cassandra table:
cqlsh> select * from ignite.custom_counter;username | counterone | countertwo
----------+------------+------------
riccamini | 10 | 10(1 rows)
cqlsh>
We can also test Ignite read-through capabilities: restart the Ignite server process and this time try asking for the same cache key without writing it before. Ignite should trigger a Cassandra read after the cache-miss, and handles you the data previously inserted.
Final considerations
Apache Ignite is really powerful tool which is worth exploring if you are approaching the ecosystem of Big Data solutions. It offers an in-memory platform for transacting, computing with data at scale, and it promises to speed up Hadoop-based computations. Also, for the ones interested in Machine Learning applications, it implements Shared RDDs for Spark clusters.
In this article we have seen how it can be used to overcome a common issue related to Cassandra, and the various options it offers. My hope is that this reading will help the ones approaching to this tool for the first time. Also, feel free to share your ideas and suggestions on the topic if you have some!
Thanks for reading!
