Apache Ignite and CAP Theorem

Valentin Kulichenko
7 min readApr 15, 2020

--

Apache Ignite is a distributed data store (among other things). Naturally, many of its users want to understand how it works with respect to the CAP theorem.

The CAP theorem describes three characteristics that any distributed system has:

  1. Consistency — whether a system guarantees to provide the latest written value to a reader. Let’s say, you have an integer variable stored in your database, which you update with value “1”, then with value “2”, and then attempt to read. Your database is consistent if it can only return either “2”, or an error. A possibility of any other result (“1”, empty value, etc.) will violate the consistency.
  2. Availability — whether a system guarantees to process any read request without an error. In our previous example, the database is available if it answers with “2”, “1”, or an empty value, but never with an error.
  3. Partition tolerance — whether a system guarantees to continue operating in the event of network partitioning, or any other network outage. The system is partition tolerant if messages being dropped or delayed do not prevent it from functioning.

The theorem then states that it’s impossible to provide all three guarantees simultaneously. At any point in time, one of them has to be sacrificed.

In practice, we can assume that any distributed system has to be partition tolerant. Network glitches and outages are inevitable, so we have to be able to continue running despite them. This leaves us with a choice between consistency and availability.

As a result, the CAP theorem divides all distributed systems into two major classes: CP, that favor consistency over availability, and AP, that do otherwise.

Let’s go through an example. Imagine a trivial distributed system with a single server node and a client connected via the network. The network temporarily goes down, and the client tries to read a value, which the server can’t provide at this time. How should the system behave?

There are only two feasible alternatives:

  1. Throw an error. In this case, the system remains consistent but becomes not available. That’s a CP system.
  2. Do not throw an error, and return “some” value. The system remains available but does not guarantee consistency anymore. That’s an AP system.

So, it all comes down to this intuitive tradeoff between consistency and availability. The CAP theorem formalizes this tradeoff and demonstrates that it is intrinsic to the nature of distributed systems — there is no way around it.

Is Apache Ignite CP or AP?

Technically, you can configure Apache Ignite to be either one, but its overall design leans towards consistency. Ignite‘s inner mechanisms include two-phase commit, multi-version concurrency control, and others that help to provide strong consistency. The majority of Ignite users rely on this, so going forward, let’s assume that Ignite is a CP system.

Based on the CAP theorem, we can conclude that Ignite — being a CP system — can’t guarantee full availability in case there are network issues. However, nothing stops us from increasing availability as much as possible.

The availability of Ignite depends on both external and internal factors.

External factors are the ones that Ignite doesn’t control: reliability of the network and other hardware, operating systems, virtualization software, etc. Setting up a proper infrastructure can go a long way — by simply reducing the probability of a hardware or software failure, you can drastically increase the overall availability of the system over time.

Internal factors include different features and configuration options that Ignite provides for higher availability. Let’s see what those features are and how you can use them to increase the availability of your Ignite deployments.

Backup Copies

As any distributed system, Ignite consists of multiple nodes connected via the network. Therefore, at any time, one of the nodes can become inaccessible. If that happens, all the data stored on this node is erased from the memory. If someone then wants to read this data, they will get an error.

You can protect yourself from losing the data in this scenario by configuring backup copies. If you set “1” as the value for the backups parameter, Ignite will store two copies of every entry, and these copies will always end up on different nodes. Therefore, losing a single node will not cause any data loss, and the system will remain fully available.

The more backups you have, the higher availability is. However, keep in mind that every additional backup makes an update more expensive. There is a tradeoff between the availability and the efficiency of updates, so choose the number of backups wisely, based on your requirements and data access patterns.

Rack (or Availability Zone) Awareness

Let’s take a look at a more complicated deployment, where a cluster is stretched across multiple racks of servers or availability zones in a cloud. For example, with six nodes and two availability zones, you can deploy like this:

The first three nodes (shown in blue) reside in the availability zone #1, the other three nodes (shown in green) are in the availability zone #2. Also, let’s assume that the cluster is configured to have one backup copy (i.e., there are two replicas of every entry).

What will happen if a whole zone becomes unavailable, and the client can’t access three out of six nodes? Since Ignite distributes backups randomly, some of the entries will inevitably have both replicas in the same zone. Therefore, some of the data will be lost.

To solve this, Ignite allows the fine-tuning of the backup distribution. Essentially, you can assign a node attribute identifying the availability zone for every node, and then assign backups based on this attribute. As a result, you will get a guarantee that different replicas of the same entry always end up in different availability zones. Therefore, every availability zone contains the whole data set, and data loss doesn’t happen in case one of the zones becomes unavailable.

Persistence

So far, we’ve looked at different ways of preserving the availability in case of partial cluster failure. But what if all the nodes in the cluster go down at the same time?

All the in-memory data will go away, regardless of how many backup copies we configure, and how the replicas are distributed. To prevent the loss, we need to persist the data.

With Ignite, you can choose out of two options: 3rd party database integration and native persistence. The former is typically used to run Ignite on top of legacy data sources. The latter converts Ignite into a durable system of record that can serve as a replacement for the existing database or can be used for greenfield applications.

From the availability perspective, any type of persistence helps you survive a cluster shutdown. However, with a 3rd-party store, you most likely need to preload the data into memory before using it. So in case of a crash, you not only need to restart the cluster, but you also have to wait for the data to preload. Preloading can take a significant amount of time, during which the system is unavailable.

One of the advantages of native persistence is that it can serve the data without preloading it into memory. The data becomes available right away, significantly reducing the downtime, and therefore improving the availability characteristics of the system.

Replication

Another way to reduce the downtime in the case of a cluster crash is to continuously replicate the data into a second cluster located in a different geographical location.

Initially, the application connects to the primary cluster. Any updates that occur in this cluster are replicated to the secondary cluster and are applied there as well. Should the primary cluster go down, the application can quickly switch to the secondary cluster, with almost zero downtime.

There are multiple possible solutions that you can use to enable replication on the Ignite cluster, all with different RPO and RTO characteristics. A couple of options are:

  1. Queue-based replication (e.g., using Kafka). Create a continuous query that listens for updates on the primary cluster and pushes them a queue. Then use one of the streamers on the secondary cluster to apply the updates there.
  2. Data Center Replication feature provided as a part of GridGain Enterprise Edition. This is a commercial 3rd party solution that is not part of Apache Ignite, but you may still want to consider it for mission-critical production deployments.

To summarize, for different types of outages, you can apply various techniques that would make sure availability is not lost, should those outages occur.

  • Backup copies prevent the unavailability of the system if a single node becomes unavailable.
  • Rack awareness configuration helps not to lose the data if a whole rack of servers or an entire availability zone becomes unavailable.
  • Persistence and replication help in cases of full cluster unavailability.

Combining these features, deploying the clusters on reliable hardware and network, regularly updating OS and virtualization software — all this helps to reduce downtimes of your Ignite deployments and to increase availability.

Since Apache Ignite is a CP system, you can’t achieve 100% availability, but you can bring it to such high values that you will have to measure it in “nines”.

The CAP theorem tells us that it’s impossible to create a CAP system. But if the system satisfies all the requirements for being CAP 99.999% of the time, can we call it “effectively CAP”? Let’s discuss 🙂

--

--