Perlin’s Implementation of Avalanche

A Code Review

Michael Graczyk
OpenToken
9 min readJun 28, 2018

--

On 5/16/2018, a paper began circulating on social media, which had been anonymously authored and shared on IPFS. The paper describes a “new family of leaderless Byzantine fault tolerance protocols”, including a practical algorithm called Avalanche.

Although the paper’s author(s) remain anonymous, a startup called Perlin announced plans to build a decentralized computation platform using Avalanche. Perlin attracted vast attention from the cryptocurrency and ICO investment community, in part due to excitement around Avalanche and its perceived potential.

We at OpenToken found Avalanche fascinating and novel, so we were naturally drawn to Perlin and its potential. Fortunately, we were able to contact Perlin and receive access to their code, including their implementation of the Avalanche consensus mechanism.

We reviewed Perlin’s codebase¹ and implementation of Avalanche, and thought it would be helpful to explain in detail how Perlin’s code implements the algorithm described in the paper. We also provide some commentary on the code, the algorithm, and remaining work.

Any comments or bugs mentioned in this post were communicated to Perlin before publishing. Perlin is already making changes to address them. We would also like to make it clear that the code in this post is Perlin’s initial prototype, and will be extensively rewritten and improved before mainnet launch.

Background

The paper begins with a short description of its assumed model of the world, then explains a series of increasingly sophisticated algorithms. The first three algorithms, Slush, Snowflake, and Snowball, are included as teaching examples so that the final algorithm, Avalanche, will be more understandable to the reader. Only the final DAG-based Avalanche algorithm is efficient enough to be practical, so the evaluation and Perlin’s code are based exclusively on it.

This code review focuses on Perlin’s implementation rather than the Avalanche algorithm itself. We will assume that the reader has a rough idea of how the algorithm works, but we will explain details to put our commentary in context.

Perlin’s Avalanche implementation (PA) follows the paper closely. In this post, we elide irrelevant details by replacing them with a comment like // ... . Additional comments by OpenToken include // [OT].

Data Structures

PA uses protocol buffers to encode state both for processing and transmission. 𝒯 is Node.Transactions, 𝒬 is Node.Queried, 𝒫T is Node.Conflicts[T], The chits cuT is of course Node.Chits.

In addition to node state, the transactions, conflict sets, and inter-node requests are all protocol buffer messages. Many distributed systems (especially at Google) manage state in this way, because it makes serialization and message passing extremely simple. As a result, PA’s high level architecture is a collection of simple, concurrent event loops.

Top Level: The Event Loop

The last few paragraphs of Section 2 describe PA at a high level. Each node is an event loop that stalls (sleeps) until it receives some work to do.

PA currently uses protoactor-go for boilerplate related to RPC event loops. The Receive function simply processes all messages received by the node, doing work where necessary. Perlin is currently building their own networking/RPC stack called Noise, which will simplify the design and potentially replace protoactor-go as the framework in which Avalanche is implemented.

When the received message is a newly submitted transaction, the event loop executes the code in Figure 5. When the message is a query from another node requesting information about a particular transaction, the event loop follows Figure 6.

New Transactions

Transaction creation is straightforward, with the code following the paper almost exactly.

PA handles transaction generation in the node event loop, with helper functions named to match those in Figure 5.

Transaction generation involves choosing a unique UTXO, selecting the transaction’s parents, then processing the transaction with onReceiveTx, which updates the tracked confidence values transitively.

We note that as a simplification, PA uses a monotonically increasing integer as the set of UTXOs. The UTXO counter is shared between nodes running in a single process. However, any simulation run using nodes separated over a network will generate transactions with overlapping, hence conflicting UTXOs. This could be avoided for simple evaluations by assigning a unique integer id to each node and computing utxo := atomic.AddUint64(&messages.LastUTXO, NUM_NODES), initializing with LastUTXO := NODE_ID

Parent Selection

Once the transaction has been properly formed with a unique UTXO, the transaction is embedded in the node’s DAG via parent selection.

As described in Section 4.Parent Selection, nodes could select a correct set of parents by sampling uniformly from all strongly preferred transactions (we’ll explain that later). Such a selection would be tremendously inefficient, because the DAG of transactions would become wide and difficult to process quickly. Instead, the paper recommends that implementations follow Figure 19, which favors recent transactions that are likely to commit.

The result of parent selection is the “least” set of transactions for which the node has some confidence, and knows of no conflict. By “least”, we mean transactions which do not have children that also satisfy these conditions. This “leastness” constraint enables an efficient breadth-first search described in the paper as a “retreating search” toward the genesis block. However, PA does not use the efficient algorithm, instead searching every transaction.

PA also does not appear to sample from the set of eligible parents, instead selecting every eligible parent for every transaction. The lack of sampling may cause the DAG to become undesirably dense and lead to liveness failures, as described in Section 4. This is not a major issue because the code could easily be modified to include sampling and an efficient search. According to Perlin, such improvements are scheduled already.

Updating State for a New Transaction

Once a transaction’s parents have been selected, it is signed by the node and merged into its own state, just as if it had been received from a peer. The transaction is gossiped to other nodes through an anti-entropy process running in the background (not shown in the code above).

Queries about Existing Transactions

When a node sees a transaction for the first time, it queries a set of peers asking if they strongly prefer that transaction. Although not described explicitly in the paper or implemented in the code, a production implementation would also likely query nodes about transactions that have remained unconfirmed past some timeout. When a node receives a query, it updates its state and responds according to the algorithm in Figure 6.

PA implements onQuery directly in the event loop.

Notice that onQuery reuses onReceiveTx, just as indicated in the paper.

In PA’s implementation, the set 𝒯 is n.Transactions, a mapping from transaction id to transaction object. The conflict set 𝒫_T is n.Conflicts[utxo], and is newly created if it does not exist. Otherwise T is added to 𝒫_T.

After updating its own state, the node responds to the query by echoing the transaction id and indicating whether the node “strongly prefers” the transaction. We will describe “strongly preferred” in the next section, once we’ve described consensus.

Avalanche Consensus

The core consensus protocol runs in a second concurrent loop, processing every unqueried transaction then sleeping for 50 milliseconds. The code closely follows the algorithm given in Figure 4.

Nodes in Avalanche attempt to issue queries once for every transaction. They select an unqueried transaction, query a random sample of K peers, then update the confidence values for that transaction and all of its ancestors.

Nodes respond to these queries by indicating whether or not they strongly prefer a particular transaction. A transaction is preferred by a node if it is the preferred transaction of it’s conflict set:

n.Conflicts[utxo].Preferred == transaction

A transaction is strongly preferred if it and all of its ancestors are preferred:

The query loop itself corresponds nicely with Figure 4.

PA’s implementation uses K=1 and Alpha=0.75, so int(threshold) := int(Alpha*K) == 0 and the loop always continues. This appears to be a bug, because the confidence should not be updated if no peers respond. This could be fixed by removing the cast to int.

The code for QueryNodesForTx asynchronously queries K nodes with a 3 second timeout, returning the number of nodes that respond with a strong preference for the transaction. It consists mostly of boilerplate so we exclude it.

Updating Confidence

The real meat of consensus happens inside the query loop of Figure 4. When enough peers respond with a strong preference for a transaction tx, the querying node updates confidence and preference values for tx and all of its ancestors. tx will later be revisited when its descendants are queried for the first time. In this way, confidence in tx is built up over time as new transactions have their parents selected to be tx or a descendant of tx.

The code only appears to update preferences for the parents of the transaction tx, whereas it should be updating the preferences for all ancestors of tx. This bug could be fixed by replacing the loop over parents with any topological graph traversal (DFS, BFS). As a result, the DAG constructed by PA will be missing preference settings for most transactions since they will lose the opportunity to become preferred as new transactions build up “lower” in the DAG.

Checking for Confirmation: IsAccepted

No transaction processing system would be complete without a mechanism for clients to learn when a transaction has been accepted or finalized. Avalanche offers probabilistic guarantees on finality, which are parameterized using two βs.

PA’s implementation implements this recursive definition explicitly. This function does not appear to be accessible from any external API, but in practice clients would query isAccepted to learn whether to rely on the transaction’s finality. For example, an exchange would wait until its Perlin Node returns isAccepted(tx) -> true before crediting a depositing account for tx. The paper and code differ slightly in how the two commitment modes are related, but the paper appears to have a typo in the parenthesis placement. The code requires, we believe correctly, that a transaction is accepted only if all its parents are accepted.

Conclusion

Perlin has announced their intentions to build a decentralized computing platform using the Avalanche consensus mechanism. Their code is clean and organized, following the paper closely. The code is barebones and not yet production ready, but it implements the algorithm described in the paper.

There were a few bugs in correctness, and a few places where complicated and inefficient loops could be replaced by simpler, more efficient uses of data structures. These are the sorts of issues that are easy to fix with locally contained changes.

The overall quality of the code is significantly better than other blockchain projects we have reviewed (Tron, IOTA, Hashgraph², for example). It’s quality and sophistication is comparable to typical code at a “big five” tech company.

We believe Perlin’s Avalanche implementation provides evidence of their technical capability, and suggests that Perlin has the potential to deliver a practical decentralized computing platform.

Footnotes

  1. perlin-go commit 7e0f2c93e2a95a0b2ab9de43ff986b9c741a2591
  2. Hashgraph has a closed source SDK, but it includes debugging symbols hence was easily decompiled for review. Besides the prototype code, it is an overall good project.

--

--