Exploring Liveness of Avalanche

Maksym Zavershynskyi
7 min readMay 28, 2019

--

… or any consensus algorithm that relies on an unstable equilibrium.

In this post we explore the fundamental issue with the consensus algorithms that rely on unstable equilibrium, using the Avalanche family as an example. We mainly focus on the Slush, Snowflake, and Snowball algorithms that are the stepping stones to the Avalanche algorithm. We do not explore Avalanche itself here because it involves economic incentives.

Slush, Snowflake, and Snowball are consensus algorithms that allow a group of nodes connected over the network to collectively decide on one of the two colors: red or blue. The goal is to have every node commit to the same color. These algorithms are claimed to have both probabilistic safety and liveness, meaning: A) all nodes commit to the same color with almost 1 probability; B) all nodes commit within a fixed number of communications with almost 1 probability.

A certain percentage of the nodes, up to 20%, is allowed to be adversarial — they can try to prevent honest nodes from agreeing on a single color.

Unstable Equilibrium in Slush

Slush is the simplest of all three algorithms. It is not designed to work when there are adversarial nodes, but it converges really fast even when initially 50% of the nodes vote for red and another 50% vote for blue.

We start with 1000 nodes voting for red and 1000 nodes voting for blue. With each round, the system becomes more and more unstable and finally converges to all nodes voting for one color, in this case, red. We increase the round when all nodes have performed a single sampling query.

The biggest advantage of Slush is its simplicity. Each node simply samples k other nodes and asks for their color. If there is more than alpha * k votes for another color the node changes its vote:

Slush creates an unstable equilibrium around the 50/50 split — the further away the system is to the left the higher the chance that some blue node samples alpha * k red votes and changes its color to red.

Another classic example of unstable equilibrium is balancing an inverted pendulum, which demonstrates the following:

Just because equilibrium is unstable does not mean it cannot be controlled.

Unstable equilibrium of an inverted pendulum is being controlled by a Reinforcement Learning algorithm.

In the case of Slush, unstable equilibrium can be controlled by adversarial nodes that can keep the honest nodes from fully converging to a single color:

17% of the nodes are adversaries that keep honest nodes from converging to a single color. The regions outside the dotted bars are outside the adversaries’ control — where the system converges exponentially fast.

In our case, the adversaries do not need to use Reinforcement Learning to control the system (even though we have experimented with it too) — the simple strategy is enough, see the code here. Adversarial nodes are allowed to know the distribution of red and blue colors among the honest nodes and as soon as the system diverges outside the 45%-55% region they collectively vote for the minority color which pushes it back.

Slush, however, is not intended to work with adversarial nodes. On the other hand, Snowball claims to have liveness as long as the number of adversarial nodes is less than 20%.

Unstable Equilibrium in Snowflake and Snowball

Snowflake and Snowball are designed to work when there are adversarial nodes. Snowflake addresses the Safety by adding a safety check that makes a single node stop once it samples peersbeta times in a row, and each sample has more than alpha * k peers vote for the same color. Snowball additionally addresses the Liveness by making unstable equilibrium even more unstable. Specifically, each node in Snowball keeps a counter, d, that counts how many times each color had alpha * k presence in the sample, and only changes the vote once d of a different color becomes larger. This trick essentially adds inertia to the system because now a single misfortunate sample is not enough for the node to change its color.

Additional degrees of instability, however, do not make unstable equilibrium fundamentally uncontrollable. The classic example of a complex unstable equilibrium is a double inverted pendulum, and as we can see it can be controlled:

Complex unstable equilibrium being controlled. Taken from Portland State University BSP Lab.

Snowball can also be controlled by <20% of the adversaries using the same strategy we used for Slush:

17% of the nodes are adversaries that keep honest nodes from converging.

Fixing Liveness

While we saw that Snowball in its current form does not have liveness, it is interesting to argue whether any hotfix is even possible as long as Snowball depends on the unstable equilibrium breaking the symmetry.

In the original paper, the system is described as a Markov chain with discrete-time state transitions:

In the absence of adversaries, the transition probabilities can easily be computed either empirically or through analytical bounds:

  • p(i) — the probability of transitioning from s_i to s_(i+1);
  • q(i) — the probability of transitioning from s_i to s_(i-1);
  • r(i) — the probability of remaining in state s_i.

They are also well-defined, as long as there are no adversaries:

Unfortunately, neither the original paper nor the updated paper of the Avalanche present a formal analysis of the Snowball liveness in the presence of adversaries and resort to verbal arguments. In fact, it would be extremely hard to do so, if possible. There are several problems.

First, adversaries control a large portion of the states and can choose:

where b is the number of adversarial nodes. This happens simply because there is a chance that an honest node will sample several adversaries at each round that in turn can sway the node’s vote.

Additionally, depending on the specific algorithm adversaries have a certain flexibility in choosing what these values p and q actually are. This region can be assumed to be fully controlled by the adversaries if the nodes start with the 50/50 split (we denoted this region with dotted lines in the above animations).

Second, p and q are generally not constants and cannot be easily bounded, especially inside the region that is under the control of the adversaries. While Avalanche papers present an extensive analysis of the convergence time it mostly covers Slush and the case when the system is outside the region controlled by the adversaries. In fact, analyzing Snowball would be extremely hard, even if we assume that adversaries have a fixed strategy, because p and q depend not only on i — the fraction of honest nodes voting for red — but also on the internal counters d that are different for each node and change with time.

Even more so, adversaries can choose a different strategy at a different time point. While one might verbally argue that adversaries have no benefit of choosing a different macro-strategy at each point in time, it is extremely hard to do formally and so probabilities p, q, and r depend on: i — fraction of honest nodes voting for red, d — all counters of all nodes, and t —the given discrete moment of time.

The best one could hope for is running a simulation multiple times using some adversarial strategy to measure the distribution of the convergence times, if any, because even the standard method of estimating the convergence time:

E[U_i] is the expected time to state i.

is not feasible, since, inside the controlled region q, p, and r are parameterized with more than just i.

The above difficulties are inherent not only to Snowball but any algorithm that relies on the unstable equilibrium to break the symmetry because to prove liveness one needs to argue that unstable equilibrium is so unstable that no adversarial behavior (naive, RL model, or any other) can stabilize it indefinitely, or for an impractically long time.

References

See https://github.com/nearmax/avalanche for minimal implementations of Slush, Snowflake, and Snowball. The repository includes the naive adversarial strategy that one can use to break the liveness. It also produces visualization used in this post.

The original Avalanche paper.

The updated Avalanche paper.

The analysis of the Avalanche paper by Alex Skidanov.

Discussion of the safety in the Avalanche paper.

Credits

Thanks to Alex Skidanov and Marcelo Fornet for discussing Avalanche and implementing the adversarial strategy. Special thanks to Marcelo for implementing Reinforcement Learning adversarial strategies here: https://github.com/SkidanovAlex/snowball/tree/master/snowball/learning

Near Protocol runs the whiteboard series, a video-podcast in which we dive deep into other blockchain technologies with their core developers and researchers, with episodes covering Ethereum, Polkadot, Cosmos and many other protocols.

Join Near’s discord where we discuss all the tech, economics, governance and other aspects of building Near.

--

--