Inside Distributed Systems
Instead of spending time on the formal definition of what distributed systems are, what do they mean, let’s dive directly into the internals of distributed systems.
I am dividing this article into 4 chapters (parts) as its a real wide topic.
Every idea in a distributed system (possibly majority of the systems nowadays) revolves around the intent of
- Scalability → My distributed system should be able to scale/de-scale easily.
- Fault tolerance → If my machines(server/computers etc.) goes down, system should still work fine.
- Data availability → My data is available all the times.
- Consistency → (these systems are eventual consistent and the fact is widely acceptable)
- Latency cut → multiple nodes, a lot of data, quick response.
To easily understand, lets brief some terminologies being used (in simple words).
Strong consistency : Once I write the data, the data should be propogated everywhere its suppost to write and then response should be returned. After receiving the response, I read the same data and it should be the data I updated few seconds back.
Weak consistency: Once I write the data, the data would be propogated (asynchronously) everywhere its suppose to (say, in servers) and response should be returned. After receiving the response, if I read the same data, it may or may not be the same latest data I updated few seconds back.
Availability: Even if my server/s goes down, I am still able to fetch or write data, also encapsulating the fact of the server failure from me.
Partitioning : Diving data into further small blocks of data and store each block and a copy of each block to multiple locations, leading to availability.
Consistency ( C )
Availability ( A )
Partitioning ( P )
CAP theoram signifies the behavior of a distributed system in terms of its consistency, availability and partitioning. A distributed system can either be consistent and highly available. Or, system could be highly available and optimally partitioned. Otherwise, a system could be optimally partitioned and consistent. Practically, no distributed system can deliver all the three components (some systems claim to provide “close” to all 3, but cannot fully).
If you think about it, it makes sense!!! Its real hard to provide all the 3 at the same time. If a system is highly available and optimally partitioned then providing strong consistency is really hard.
To fully clear the fact, lets take an example of a distributed database. A distributed database will have peer-to-peer structure, multiple nodes (high availability of a system) forming a system. If I write some data to any of the servers, the data is replicated to multiple locations (servers) in the system. Now think, Storing a piece of data and replicating the data to multiple locations, and giving a response “iff” all the replication and writes has propogated successfully (strong consistency).
Question comes, What if some nodes handelling the copy of this data are not available? (well, strong consistency wants that all the writes should be propogated before returning the response)
What would be the latency incurred in this process. And the situation gets out of hand when the system is a multi-datacenter system? (Well, I cannot sit all day to get a response and continue further. And suppose a million user does the write at the same time…. ha ha haaaaa)
For the rest, think about it and let me know if you cannot grasp the fact :P…
Further, we will see other aspects a distributed system involves like hashing, writes/reads, fault tolerance. Stay tuned :)