Gossip Protocol in distributed systems

Prateek Gupta
Nerd For Tech
Published in
4 min readApr 23, 2022

--

Problem

In a distributed systems scenarios, the most basic problem that one can think of are :
1. System state : How are system nodes aware about the other system nodes state, whether they are dead or alive.
2. Communication : How can system nodes interact with other nodes.

For a system that wish to use a distributed system, the problems above are immaterial and all it cares about is the end result. But for a distributed system, this is a core problem for evaluating the final result be it a database, search engine, object storage, a load balancer or maybe something else.

To address the problems mentioned, there can be two possible solutions :
1. A centralised system which manages the system state, for example zookeeper in kafka. The problem with this is single point of failure. But the finer side is this setup is inclined to CP side of CAP theorem providing higher consistency guarantee.
2. A peer-to-peer solution to track system state. This solution is inclined to AP side of CAP theorem providing eventual consistency. But the solution is highly scalable and more resilient. Gossip protocol based algorithms fall under this category and is famously used in cassandra.

In a distributed system you need at least two independent sources of information to mark a node down. It’s not enough to simply say because your node can’t contact another node that the other node is down. It’s quite possible that your node is broken and the other node is fine. But if other nodes in the system also see that other node is dead then you can with some confidence conclude that that node is dead. Many complex hard to debug bugs are hidden here. How do you know what other nodes are seeing? Via a gossip protocol exchanging this kind of reachability data.

Gossip protocol

Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about.

Gossip protocol based algorithms are one of the most robust and scalable algorithms out there for strongly eventual consistent membership list, fault detection, and can piggy back any information on top of the gossip messages.

Gossip is about letting each node send the latest information it happens to have to some set of other nodes eventually spreading that information throughout the network. It’s a way for nodes to build a global map from limited local interactions.

Gossip protocols are at best eventually consistent, but aren’t even necessarily that. And if there’s a partition, nodes in sub-partitions will still happily gossip with each other. So a request that hits one side of a partition can get totally different answers than if it hits another side.

High level details about the protocol

  • In the cluster, every member keeps a list of a subset of known members and their addresses and some metadata.
  • Periodically, each member updates its list of neighbors’ heartbeat counters based the data emitted by different nodes and sends the updated information to some further neighbours.
  • Upon receipt of such a gossip message, a node merges the list in the message with its own list and adopts the max heartbeat counter for each member.
  • As long as heartbeat counter keeps on increasing for a node, it is guaranteed to be healthy and is considered dead if heartbeat hasn’t increased for more than some threshold time period.
  • It seems somewhat obvious that you would transmit node properties to other nodes. Stats like load average, free memory, etc. would allow a local node to decide where to send work, for example. If a node is idle send it work (as long as everyone doesn’t send it work at the same time). This local decision making angle is the key to scale. There’s no centralized controller. Local nodes make local decisions based on local data. This can scale as far as the gossip protocol can scale.

Ending notes

This post has tried to provided a high level understanding of the gossip protocol. Since there are multiple implementation of the protocol, its not possible to explain each of the variant. The way to go further in the topic is to pick up a specific algorithm and go deep in that. A few relevant links have been provided but the list is huge and is dependent on the specific software using their implementation of the protocol.

Hope this helps. Let me know in the comments if you find any discrepancy in the post. Thanks for reading!!!

--

--

Prateek Gupta
Nerd For Tech

Lead Software Engineer from India. Fancy learning and building new stuff. Making a habit to share knowledge via blogs