Distributed Snapshots
Through my last series about Clocks & Version Vectors, the idea was to discuss problems establishing order/causality in a distributed system and ways to achieve the same. One problem in the real world, that needs both concepts is Distributed Snapshots!
Problem Statement
Point in Time Snapshots is critical for capturing the “consistent” state of systems, which can be restored in case of any loss of system state, making your system fault tolerant.
Taking a snapshot of one particular server is easy. You define a cut-off time and at…