System Design #A1:- Consistency in Distributed Systems: A Fundamental Challenge

Anshumaan Tiwari
Javarevisited
Published in
6 min readDec 29, 2023

--

What is a Distributed System?

A distributed system is a collection of independent computers (nodes) that collaborate to achieve a shared objective. These nodes are interconnected through a network and operate as a unified system. In a distributed system, each node possesses its own memory and processing capabilities, and they communicate with one another to coordinate and Evaluate their activities. The primary aim of a distributed system is to enhance performance, reliability, and scalability compared to a centralized system.

What is data consistency?

in simple terms when multiple users or processes access and update data concurrently, the system must ensure that the data remains accurate and reflects a well-defined state.

means if any update occurs in a database the database should always reflest the latest updated data.

Types of Data Consistency?

In each of the 3 topics i have inserted some images please try to understand the images as these topics are difficult to understand just with the help of text.

Linearizable Consistency

Understanding Distributed Systems: A distributed system is a group of independent computers that appear to its users as a single computer system. They are connected through a network and communicate with each other to perform tasks.

Transactions in Distributed Systems: In a distributed system, multiple processes can access and modify shared data simultaneously. To maintain consistency, these operations are grouped into transactions. A transaction is a sequence of one or more operations (like read, write, update) that forms a logical unit of work.

Concurrency Issues: When multiple transactions are executed concurrently, it can lead to concurrency control issues. For example, two transactions might try to update the same piece of data at the same time, leading to inconsistent states.

Linearizability: To solve these issues, we use a consistency model called linearizability. Linearizability ensures that the result of executing a sequence of operations is the same as if those operations were executed sequentially, one after another, on a single processor.

How Linearizability Works: Imagine a line of people waiting to buy tickets for a concert. Each person represents a transaction. The ticket seller (the system) gives out tickets one by one, in the order the people arrived. Even though people might arrive and leave at different times, the seller ensures that everyone gets a fair chance to buy a ticket. Similarly, in a distributed system, linearizability ensures that all transactions are processed in a fair and consistent manner.

Real-World Example: Consider a bank account system where multiple transactions (deposits, withdrawals) can occur concurrently. Linearizability ensures that even though these transactions may happen at the same time, the final balance of the account will be correct as if all transactions happened one after the other.

Problems

High latency

Low Availability

This can occur if a transaction freezes which will eventually freeze the entire application

To implement use

Single thread queue

RAFT(Raft Consensus Algorithm)

Eventual Consistency

How Eventual Consistency Works: Imagine a group of friends sharing a story. Each friend hears the story from someone else and then passes it along to the next friend. Over time, everyone in the group hears the same version of the story, even though the story was initially told differently. Similarly, in a distributed system, eventual consistency ensures that all nodes will eventually have the same view of the data, even though they might see different values at different times.

Let’s consider a scenario of a distributed database with eventual consistency using a simple example:

Imagine an e-commerce application with a distributed product catalog. The catalog is stored across multiple nodes for improved performance and fault tolerance.

Initial State:

Product X is priced at $50.

Nodes A, B, and C have replicas of the catalog.

Update Request:

A user on Node A updates the price of Product X to $55.

Asynchronous Updates:

Due to the nature of eventual consistency, Nodes B and C are not immediately updated with the new price. Each node operates independently.

Temporary Inconsistency:

For a brief period, Node A reflects the updated price of $55, while Nodes B and C still show the old price of $50. This temporary inconsistency is acceptable in the eventual consistency model.

Convergence Over Time:

As time progresses and no new updates occur, the system works to converge towards consistency. Nodes B and C eventually receive and apply the update, bringing the price of Product X to $55 on all nodes.

Eventual consistency is coupled with:

Below 3 subtopics are advance topic :-

Read-Your-Writes Consistency: This is a stronger form of consistency compared to eventual consistency. In a read-your-writes system, a client reading a piece of data will always see the value that was written by the same client, assuming there are no new writes to that data by other clients. This is stronger than eventual consistency because it provides immediate consistency for the client that performed the write. However, other clients may see stale data until the changes propagate throughout the system.

Monolithic Read Consistency: In a monolithic read system, a read operation returns the most recent write to the data. This is stronger than eventual consistency because it guarantees that all reads will see the latest write. However, it can lead to higher latency because the system must wait for all updates to propagate before a read can return.

Monolithic Write Consistency: In a monolithic write system, a write operation is not considered complete until all replicas of the data have been updated. This is the strongest form of consistency, providing the highest level of data integrity. However, it can lead to lower availability and performance because the system cannot accept new writes while waiting for all replicas to update.

Casual Consistency

Operations for same key are extracted sequentially.

Causal Relationship: Causal consistency revolves around the cause-and-effect relationship between operations. If one operation logically depends on the outcome of another, the database ensures that this causal link is maintained.

Operations Order: When operations are performed in a distributed environment, the order in which they are executed matters. Causal consistency focuses on preserving the order of causally related operations.

Logical Dependency: If, for example, you have an operation A that causes another operation B, causal consistency ensures that all nodes in the distributed system acknowledge the causal relationship. This acknowledgment allows for a consistent view of the database state.

Flexible Ordering: Unlike strong consistency, which enforces a global order for all operations, causal consistency allows for more flexibility. Operations that are not causally related can be executed in different orders on different nodes without violating the causal dependencies.

Example is shown in the image.

Final ordering of consistency

Linearizable > Causal > Eventual

Be Limitless

--

--

Anshumaan Tiwari
Javarevisited

Software Developer having a little passion for technical content writing