Distributed Computing Concepts
A Brief Introduction
Preface
It would be rather nice if computers never failed, networks were reliable and all operations were always cleanly and gracefully executed, never leaving the system in a state of ambiguity. Unfortunately, this cannot be; we have to contend with the bleak harshness of reality, which is particularly exacerbated when dealing with distributed systems. The reason that designing distributed systems is hard is largely due to the inherent lack of atomicity: where a centralised system fails as a whole, a distributed system fails in a piecemeal fashion, potentially leaving it in an inconsistent state; the latter is notoriously difficult to detect, mask and repair.
My line of work frequently entails lengthy discussions on distributed systems with software engineers and architects. I’ve also authored several peer-reviewed papers on distributed algorithms, consensus and concurrency control theory. This article is a mish-mash of some of my previous work, which I’ve been asked to repost here by a good colleague.
Liveness and Safety
In the realm of distributed systems, like in concurrent computing, there are two fundamental properties of interest: liveness and safety.