Global Snapshot, Chandy Lamport Algorithm & Consistent Cut
A Global Snapshot or a global state consists of local states of each process in the distributed system along with the in-transit messages on communication channels. The snapshot problem in a distributed system tries to determine the “current” snapshot of the global state without stopping the system. A Global snapshot is important, in many scenarios such as:
- determine safety points for a distributed database
- deadlock detection
- termination of an algorithm
- garbage collection of objects
- to find out the current load of a distributed system
Two major problems in determining a global snapshot are:
- One cannot catch all processes at the same time
- Messages that are on the way cannot be seen
The classical algorithm that is used to determine a global snapshot in a distributed system is the Chandy-Lamport Global Snapshot Algorithm, 1985.
The assumptions of the algorithm are as follows:
- Neither channels nor processes fail, communication is reliable
- Channels are unidirectional and provide FIFO-ordered delivery
- There is a communication path between any two processes in the system
- Any process may initiate the snapshot algorithm
- The snapshot algorithm does not interfere with the normal execution of the processes
- Each process in the system records its local state and the state of its incoming channels
- Global state is collected in a distributed manner
The snapshot algorithm works using marker messages. The marker message is a special control message and not an application message. It does not interfere with application messages. The algorithm has three phases: Initiation, Propagation and Termination and works as follows:
1. Initiating a Snapshot
- Process Pi initiates the snapshot.
- The initiator, Pi records its own state and creates the marker message.
- Pi then forwards the marker message to all other processes using the using N-1 outbound channels Cij(j = 1 to N & j != i).
- Pi starts recording all incoming messages from channels Cji (j = 1 to N & j != i).
2. Propagating a Snapshot
If a process Pj receives a Marker message on an incoming channel Ckj, and if this is the first Marker that Pj is seeing:
- Pj records its own state and mark Ckj as empty.
- Pj then forwards the marker message to all other processes using the N-1 outbound channels.
- Pj starts recording all incoming messages from channels Clj for l not equal to j or k.
If the process Pj has already seen a Marker message:
- Mark the state of channel Ckj as all the messages that have arrived on it since the recording was turned on for Ckj.
3. Terminating a Snapshot
The algorithm terminates when:
- All processes have received a marker (which implies that process state has been recorded)
- All processes have received a marker on all the N-1 incoming channels (which implies that state of all channels has been recorded)
- Later, if needed, the central server collects all these partial states to build the global snapshot.
Consistent Cut
A cut in a space-time diagram is a line joining an arbitrary point on each process line that slices the space‐time diagram into a PAST and a FUTURE. A consistent global state corresponds to a cut in which every message received in the PAST of the cut was sent in the PAST of that cut. Such a cut is known as a consistent cut. A consistent cut obeys causality and any run of the Chandy-Lamport Global Snapshot algorithm creates a consistent cut.
References:
[1] https://www.coursera.org/learn/cloud-computing/lecture/hndGi/1-2-global-snapshot-algorithm
[2] https://en.wikipedia.org/wiki/Chandy%E2%80%93Lamport_algorithm