Persistence Options for Apache Ignite

Valentin Kulichenko
4 min readApr 28, 2020

--

Source: https://www.pexels.com/

The backbone of Apache Ignite is its distributed in-memory storage. By putting data into memory and partitioning it across multiple computers, Ignite achieves high performance and unlimited scalability.

In-memory storage is lightning-fast but not durable. If one of the computers in a cluster crashes, you lose all the entries stored on this computer. While Ignite allows you to enable backup copies to reduce the probability of data loss, that doesn’t bring it to zero.

The only way to fully protect the data integrity is to persist it.

Ignite follows the multi-tier approach, allowing for several persistence options. In summary, you can have three different types of deployment:

  1. In-memory only. The persistence layer is optional in Ignite, so you can run without it.
  2. In-memory layer integrated with an external disk-based database via read/write-through mechanisms.
  3. In-memory layer with the Ignite’s Native Persistence.

The in-memory only option is straightforward. Mainly, you want to use it for cases that do not require the long-term durability of the data — for example, simple caching or Web Session Clustering.

The latter two options can be a little confusing, however. Why is there a choice in the first place? Which type of persistence should you use and when?

Let’s investigate.

Ignite integrates with external databases through a pluggable CacheStore interface, which looks like this:

By implementing this interface, you instruct Ignite how to interact with your database — do the initial preload, read-through, and write-through. Ignite then takes care of maintaining consistency between the cache and the database.

This way, you can integrate with virtually any kind of data store, whether it’s a relational database, a NoSQL database, a document database, a mainframe, or anything else.

It is also worth mentioning that Ignite comes with out-of-the-box implementation for relational JDBC-compliant databases, as well as for Apache Cassandra. If you use one of these storages for persistence, you don’t have to write the integration code.

The approach is very flexible, but comes with several limitations:

  • Read-through works only for key-value access. Some other APIs, like scan or SQL queries, require that you have all the data in memory before you invoke these APIs. Therefore, in many cases, you will have to manually preload the data before making it available to the applications. Preloading often takes a significant amount of time, increasing potential downtimes.
  • The in-memory layer accelerates reads but not writes. Updates have to be propagated to the underlying database, which still limits scalability and performance.

The intention behind the Native Persistence Store is to mitigate the issues described above.

It is based on page memory architecture, where every page is guaranteed to be stored on disk and also can be cached in memory. During a read or a write, Ignite automatically detects if the required pages are currently in memory or not, and fetches them from the disk if needed. This behavior is entirely transparent for the application and does not depend on which APIs it uses.

As a result, you get instantaneous restarts.

If a cluster crashes or restarts for any other reason, there is no need to wait for preloading anymore. Data becomes available right away, drastically reducing downtime windows. While the application reads and updates the data, Ignite lazily fetches it into memory for improved performance.

Furthermore, thanks to how the Native Persistence is designed, it also allows you to have only a part of the data in memory. Imagine that you have a huge dataset that is measured in terabytes or even in petabytes. Typically, you will only need a relatively small subset that is frequently accessed, and therefore requires higher performance and scalability characteristics. Managing this type of data distribution with an external database is possible, but challenging. With the Native Persistence, it’s effortless — Ignite uses LRU policies to keep the most critical data in memory, while other data remains available for historical analytics and other purposes.

Finally, the Native Persistence is far superior in terms of performance and scalability of updates. Ignite distributes disk data in the same way as it does that for the in-memory data. Every update goes into memory first, and then to the local write-ahead log on disk. Since there are no synchronous network or random-access disk operations, you can achieve much lower latency and much higher throughput.

Generally speaking, the Native Persistence provides much tighter integration with the Ignite’s in-memory storage than any external database. With the Native Persistence, Ignite possesses the full knowledge of where the data is stored, which helps it to achieve better performance and scalability without comprising consistency, durability, and integrity of the data.

The only tradeoff — you have to get rid of existing storage in case you already have one. For some of the mission-critical applications, you might want to take a more gradual approach and stick with your battle-tested database for a while. This is where an integration via the CacheStore comes handy.

That said, the rule of thumb is quite simple:

  • If you want to provide better performance and higher scalability to an existing application, and you are not ready to rip and replace your legacy database — keep it and deploy Ignite on top.
  • If you are about to develop a brand new application that requires scalability from the get-go — the Native Persistence is the superior choice.

--

--