Global Snapshot, Chandy Lamport Algorithm & Consistent Cut

Sruthi Sree Kumar
Big Data Processing
3 min readMay 17, 2020

A Global Snapshot or a global state consists of local states of each process in the distributed system along with the in-transit messages on communication channels. The snapshot problem in a distributed system tries to determine the “current” snapshot of the global state without stopping the system. A Global snapshot is important, in many scenarios such as:

  • determine safety points for a distributed database
  • deadlock detection
  • termination of an algorithm
  • garbage collection of objects
  • to find out the current load of a distributed system

Two major problems in determining a global snapshot are:

  1. One cannot catch all processes at the same time
  2. Messages that are on the way cannot be seen

The classical algorithm that is used to determine a global snapshot in a distributed system is the Chandy-Lamport Global Snapshot Algorithm, 1985.

The assumptions of the algorithm are as follows:

  • Neither channels nor processes fail, communication is reliable
  • Channels are unidirectional and provide FIFO-ordered delivery
  • There is a communication path between any two processes in the system
  • Any process may initiate the snapshot algorithm
  • The snapshot algorithm does not interfere with the normal execution of the processes
  • Each process in the system records its local state and the state of its incoming channels
  • Global state is collected in a distributed manner

The snapshot algorithm works using marker messages. The marker message is a special control message and not an application message. It does not interfere with application messages. The algorithm has three phases: Initiation, Propagation and Termination and works as follows:

1. Initiating a Snapshot

  • Process Pi initiates the snapshot.
  • The initiator, Pi records its own state and creates the marker message.
  • Pi then forwards the marker message to all other processes using the using N-1 outbound channels Cij(j = 1 to N & j != i).
  • Pi starts recording all incoming messages from channels Cji (j = 1 to N & j != i).

2. Propagating a Snapshot

If a process Pj receives a Marker message on an incoming channel Ckj, and if this is the first Marker that Pj is seeing:

  • Pj records its own state and mark Ckj as empty.
  • Pj then forwards the marker message to all other processes using the N-1 outbound channels.
  • Pj starts recording all incoming messages from channels Clj for l not equal to j or k.

If the process Pj has already seen a Marker message:

  • Mark the state of channel Ckj as all the messages that have arrived on it since the recording was turned on for Ckj.

3. Terminating a Snapshot

The algorithm terminates when:

  • All processes have received a marker (which implies that process state has been recorded)
  • All processes have received a marker on all the N-1 incoming channels (which implies that state of all channels has been recorded)
  • Later, if needed, the central server collects all these partial states to build the global snapshot.

Consistent Cut

A cut in a space-time diagram is a line joining an arbitrary point on each process line that slices the space‐time diagram into a PAST and a FUTURE. A consistent global state corresponds to a cut in which every message received in the PAST of the cut was sent in the PAST of that cut. Such a cut is known as a consistent cut. A consistent cut obeys causality and any run of the Chandy-Lamport Global Snapshot algorithm creates a consistent cut.

Fig 1: Consistent vs Inconsistent Cut

References:

[1] https://www.coursera.org/learn/cloud-computing/lecture/hndGi/1-2-global-snapshot-algorithm

[2] https://en.wikipedia.org/wiki/Chandy%E2%80%93Lamport_algorithm

--

--