Improving Application Availability: Redundancy and Persistence

Published in

SSENSE-TECH

9 min readSep 27, 2024

Continuing on our road through application availability, let’s expand on what we started in the first part of this series. In our previous article, we explored availability, the famous five 9s, and why it’s important to segment your application according to the criticality of its constituent parts.

We also briefly discussed redundancy as the first step toward improving your application’s availability. Let’s continue from here, focusing first on the underlying principles, and then presenting specific technology examples that can be used.

The series structure is:

Basics of Availability
Redundancy and Persistence (this article)
Graceful Degradation and Asynchronous Processing
Disaster Recovery
Active-Active Patterns

Redundancy

In its simplest form, redundancy can be achieved by having one or more instances of a given resource you’re trying to utilize, be it compute, persistence, or anything in between. If a resource is no longer available, you can quickly divert traffic to any of its redundant siblings.

The example below illustrates having two web servers, a main and its replica. Initially, traffic is directed exclusively to the main one. Upon detection of failure, the traffic is diverted to the replica.

Figure 1. Basic redundancy achieved by routing the traffic to the working server.

A common solution that’s used for this situation is known as a reverse proxy or load balancer.

Figure 2. Using a load balancer to spread the traffic among servers.

The client ultimately sends requests to the IP of the proxy/load balancer, which has a list of available servers for that particular traffic pattern and directs the traffic to one of them.

It’s important to note that even though availability and scalability are two separate subjects, availability solutions are commonly used as part of scalability ones. For example, when using a load balancer, traffic is sent to all available servers without the concept of a main server.

Common implementations can be configured to periodically reach a healthy endpoint on the servers to help determine when one is no longer available and dynamically update the list accordingly.

Figure 3. Dynamic update of available/healthy servers in a load balancer.

If you are using Kubernetes, this process also takes place when pods are created/destroyed, and requests are routed among the healthy ones.

Figure 4. Similar healthy routing happens with K8S as well.

It’s worth mentioning that having a redundant resource is not without issues or costs. One of the main challenges of having a replica of an application is ensuring consistent behavior regardless of which instance it’s serving.

For example, imagine you send a request to a web server, it processes and sends back a response, then goes down shortly after replying. Your next request will be routed to a new instance. Will it behave the same way as the original?

Figure 5. If subsequent requests dependend on who served them we will be in trouble.

This is much simpler if your application has no persistence and the entire context is provided with any subsequent request. In this case, your main concern is to ensure that as you deploy your application, all replicas receive requests in the same version, to guarantee backward compatibility between versions.

Figure 6. When having replicas, having them with the same version of the application is a challenge/need.

It is important to highlight that sometimes, even if your application does not have an external persistence, it may use ephemeral local storage, such as memory, to store context related to the requests.

Figure 7. Sticky sessions are problematic for redundancy.

Applications that rely on these request “sessions” are prone to issues as any information stored in those sessions is unavailable in other instances in case of failure.

For this reason, stateless services are generally easier to provide redundancy. Whether it be a new server, virtual machine, container, pod, or function-as-service, spin a new one, add it to your routing load balancer, and off you go.

But what about when the state needs to be persisted?

Persistence

Most applications consist of two main components:

The code that defines the logic to be executed
The data that is used, collected, or manipulated as the code runs

In the vast majority of applications, the code changes way less often than the data. That is why redundancy for persistence has its own unique needs and solutions.

Figure 8. A single write is all it takes to make replicas to diverge!

After the first modification, our redundant server becomes outdated. In the example, the secondary request would not find the modified information because it only exists in the original server.

To address this, we need to ensure that the information is available to all persistence servers.

Figure 9. The solution is to copy the updates from one server to other(s).

Most persistence solutions will implement this synchronization in one of the two forms: synchronously or asynchronously.

In the synchronous version, the data is only considered saved when it is persisted in a number N of the servers, where 1 < N <= total_servers

Figure 10. Synchronous only considers a success when the information has been persisted elsewhere.

This is usually the most complex method as it needs to solve issues, such as what if the first write succeeded but the second failed? Should I reverse the first one or try it again (or replace the failed server with a new one)?

Additionally, due to the non-zero latency between the two servers, the overall delay increases as the number of replicas required for confirmation grows before returning a success.

In the asynchronous version, the data is considered saved when it is persisted to the server that received the request. Then asynchronously it is replicated to the other servers.

Figure 11. Asynchronous returns as soon as it saves the information locally.

While this method may be considerably simpler than the synchronous version, it comes with certain considerations that must be taken into account:

You may lose data

Since the copy is done asynchronously, if the first server has issues before the new changes are propagated, any remaining server that is promoted to receive new requests will not have those changes.

While not desirable, some applications may be more lenient to this while for others it would be a major problem.

Figure 12. We may lose data if the original server dies before the copy takes place.

If you split reads among replicas, you may serve stale data

A common scalability strategy is to distribute read requests among the servers to avoid overloading the write server. However, in doing so, the delay between the write and its propagation may lead to the read-after-write issue where your application modifies the state of an entity and reads that same entity to perform additional operations.

Figure 13. Eventually consistent as the information takes time to be reflected in all servers.

Let’s look at some of the features provided by common persistence solutions from AWS.

AWS Solutions

AWS provides a wide range of managed services that offer persistence. I will cover a subset of the available services, focusing on their redundancy aspects.

Some of them will be revisited from different angles in future articles as we progress on our availability journey.

Availability Zones

The term “availability zones” will be referred to often, so let’s start with a simple definition. AWS infrastructure is grouped into regions, each covering a specific area. Each region is broken down into availability zones (AZ), and each one is made of a collection of geographically dispersed data centers connected to guarantee minimal latency.

While you can read more about regions and availability zones here, the key points we will focus on are:

They are independent, meaning that any failures in one AZ won’t affect the others.
They are physically separated, reducing the chances of a major problem affecting all zones simultaneously.
They are close enough to each other to enable low latency communication between them, negligible for most applications.

Relational Database Service (RDS)

With RDS, you can select the database engine you want, including options such as MySQL, PostgreSQL, Microsoft SQL Server and Oracle, and have the ability to choose the compute power (CPUs, Memory, I/O) and storage needed. It is not possible to cover all options for each engine here, so consider the examples below using PostgreSQL as the engine.

With RDS you can choose up to two replicas located in separate availability zones, providing one primary instance (writer) and the other two stand-by (reader) instances.

The communication between the primary and the stand-by instances is done synchronously to guarantee that no data is lost.

You have a specific reader endpoint that can help with the read latency, while directing the writes to the primary. If the primary is no longer fit to receive writes, a failover will take place and one of the stand-by instances will be promoted as the new primary within 35–60 seconds. During this time attempts to write are expected to fail.

By choosing this approach,you can achieve the redundancy needed, with expected uptimes of 99.95%.

Aurora

Aside from RDS, Aurora is another service that provides some interesting capabilities:

It expands the Multi-AZ replicas from 2 to up to 15.
It automatically copies data across six storage nodes in different AZs of the same region, even if you do not have any read replica.

Having more replicas gives you increased read capacity but also more options when promoting a read replica to become a new primary. The lag between the primary and the replica is usually less than 100 ms, which means there is still a chance of an outdated read after a write.

Even if you decide not to have a replica, the fact that you have copies of data stored automatically in different AZs helps to prevent data loss, albeit at the expense of a longer recovery time, as a new primary will be recreated in the event of failure with the existing one.

DynamoDB

Departing from the traditional SQL-based offerings, DynamoDB offers a persistence model where the information is spread into partitions with a dual consistency approach.

A write operation first saves the updated data to a persistence node. It is then synchronously copied to another persistence node. Only at this point, the operation is confirmed to the caller.

There is an asynchronous process that copies it from the second persistence node to a third one.

Figure 14. A mixed mode of synchronous + asynchronous copy in DynamoDB.

This means you have the redundancy of the data being persisted into 3 nodes, each located in a separate AZ. At the same time, you do not need to wait for all 3 nodes to save before returning the operation, which helps to maintain the latency at a lower level.

When retrieving data you have two choices: eventually consistent and strongly consistent.

If you opt for eventually consistent, your operation will be directed to any of the 3 nodes. If it happens to be the asynchronously copied one, there is a chance the information you will retrieve will be outdated when compared to the main node.

In contrast, the strongly consistent mode will only be directed to the main node.

The Plot Thickens

A key to availability is redundancy. Having more than one resource capable of serving our requests enables us to have contingency: what happens when the main resource fails?

This means adding redundancy to the compute and persistence aspects of your application. A common path to handle the compute aspect passes by using some sort of load balancer or reverse proxy that hides the real server from the client and handles the redirection of traffic to the right server.

If your application is stateless, this is much easier due to the independence of subsequent requests that could be served by different instances.

Persistence on the other hand is inherently more complex as it hosts data that mutates over time and where request N affects request N+1.

On the AWS front, many managed solutions offer facilities that handle the redundancy behind the scenes. This is where we first encounter our trade-offs between the safety of having multiple copies — geographically dispersed — and issues of eventual consistency or higher latency due to a synchronous requirement.

But our exploration is not over. In our next article, we will discuss additional practices at your disposal to improve the availability of your application before delving into more costly solutions, such as active-active multi-region architecture.

Editorial reviews by Catherine Heim & Sam-Nicolai Johnston.

Want to work with us? Click here to see all open positions at SSENSE!