Blockchain Research Newsletter #0: Parallel Chains and Graphene

Published in

Blockchain Research Newsletter

11 min readJan 8, 2019

This is the first post in our newsletter focused on summarizing research papers in the blockchain space. Every 2 weeks to 18 months, we will be summarizing papers that we suspect a lot of people have wanted to read but have no time to due to the length of papers, the steep learning curve or both. These papers will focus on different aspects of blockchain technology such as scalability, privacy, and consensus mechanisms.

Since Satoshi Nakamoto published the Bitcoin whitepaper in 2008, there has been an explosion of papers being published in this burgeoning field of blockchain technology. Thus, it is very hard to keep up and on top of the latest advances in the field, as they require deep knowledge of distributed systems, cryptography, economics, and computer science. Most people (developers, other researchers, project managers, etc.) want to understand these papers at a high enough level in order to build on top of them, implement them in real life, or simply understand the landscape of blockchain technology.

In this first post, we will be going through Parallel Chains, a method to increase blockchain scalability released by IOHK, a company founded by Charles Hoskinson and well-known for his work with Ethereum (and eventually Ethereum Classic) and Cardano. We will also cover Graphene, a method to improve block propagation in Bitcoin, co-authored by, among others, Gavin Andresen, an early Bitcoin developer.

Parallel Chains

It is widely known that blockchains suffer from a scalability problem. Notoriously, Bitcoin can process on the order of 7 transactions per second while its main competitor, Ethereum, can process 10–15 transactions per second. When these numbers are compared to major payment providers like Visa, we realize we have a long ways to go before blockchains can reach that kind of transaction throughput. Parallel Chains, a method set forth by the IOHK research team, aims to provide a method to scaling both PoW- and PoS-based blockchain protocols. The team also sets forth a formal model for analyzing transaction throughput. First, we will go through their proposed model for analyzing transaction throughput and then we will be summarizing the parallel chains construction for both the PoW and PoS blockchain settings.

Model for transaction throughput

Assume that each party in the protocol has the same amount of bandwidth and that there is a semi-adaptive adversary that can delay messages sent across the network. Note that these delays are bounded in that the adversary cannot indefinitely delay messages. We also assume that we are operating in a peer-to-peer gossip network. We define a slot to be the time interval in which a block of size b gets propagated to the network, given the bandwidth of each party. Parties can announce messages to others at a rate of one block per slot. If they announce more than one block per given slot, it will be queued locally and announced at a later date, one by one. Upon delivery to other parties, these blocks are stored in the recipient’s inbox and can be retrieved at a maximum rate of m blocks per slot, where m is called the bandwidth-asymmetry parameter of the model. For our purposes, we will set m = 1. In addition to delaying messages, an adversary can also announce messages to different parties. Due to the gossiping nature of the network, eventually, all parties in the network will receive these messages. This will not affect how many such messages they can send in a slot but it will be queued as usual on the recipient’s end.

Parallel Chains in Proof of Stake setting

The presentation of Parallel Chains in a PoS settings assumes that Ouroboros Praos is the PoS algorithm. Ouroboros Praos works in the same model as was presented earlier. Thus, it is secure in the semi-synchronous setting with fully adaptive adversaries.

In Ouroboros Praos, all parties must stake some amount of tokens in order to participate in securing the blockchain. In each slot, a block gets created by a single party (the leader). The probability that a particular party gets selected as the leader for that slot is proportional to their stake in the protocol. Each party checks if they were elected as a slot leader by locally computing a verifiable random function (VRF) using their private key, with inputs to the VRF of the slot number and epoch randomness. A VRF is a pseudo-random function for which one can publicly verify the correctness of the output. A series of slots is called an epoch. The epoch randomness is a hash of VRF outputs of previous blocks from the first two-thirds of the previous epoch. Once elected as leader, a party has the right to produce blocks for that slot. Due to the nature of the slot election, it is possible to have no or many slot leaders for a particular slot. Blocks are signed using a cryptographic construct called key-evolving signatures, which are a type of signature that mitigates the damage caused by key exposure by splitting the time for which keys are valid over a small length of time. In Ouroboros Praos, such keys are valid only for a given slot. Parties in the protocol collect valid blocks and update their state to reflect the longest chain without too many forks. For more information on Ouroboros Praos, you can read the paper here.

Now, we go over the Parallel Chains composition in the Proof of Stake setting. Roughly, Parallel Chains in the PoS setting involved running mcopies of the slightly modified Ouroboros protocol in parallel such that all of these copies maintain a joint blockchain.

Each party acts as in the usual Ouroboros protocol by retrieving the keys required for participating in the protocol. Then, locally, each party initiates all m chains using either the genesis block (if it is the first round) or other chains from the environment. Note that the paper has not clarified what the environment is and in the rest of this explanation, we will be assuming that the environment is the peer-to-peer gossiping network. Transactions from the different blockchains get merged by first collecting all the transactions in all of the blockchains, then sorting them by their position within the block its in, the chain in which the block is in and the current slot index. After this sorting, the list is pruned to get rid of invalid transactions. In order to retrieve finalized transactions on all the chains, a party computes a measure t-finalize, where t-finalize is the maximum slot where all the chains have been finalized. In other words, find the smallest slot index for which a block is k-deep-blocks, for some number k, in any of the chains. Then, any party can retrieve the state by merging the transactions as explained previously. Slot leaders are determined as per the usual Ouroboros Praos protocol where the epoch randomness is obtained by hashing together. Slot leaders simply create blocks for whichever chain they were selected for.

Parallel Chains in Proof of Work setting

A key distinction between Proof-of-Work chains and Proof-of-Stake chains is the complete lack of a synchronized clock in the former. Miners can choose whatever timestamp they want for blocks they mine, and blocktimes are completely irregular, unlike PoS chains. Due to this, the formulation for Parallel Chains in a PoW setting is different.

At a high level, the “cryptographic puzzle” used in Proof-of-Work can be modeled as a random oracle (RO), where each participant asks the oracle “do I have the right to produce the next block,” with the oracle answering “yes” to one user and “no” to the rest. For Parallel Chains with transaction sharding, this “puzzle” is reworked to be interpreted as a 1-of-m RO, for m shard chains. In other words, the RO will grant not just the right to produce a block on a singular chain, but rather grants the right to produce a block on only a single shard chain. Miners therefore mine on all shard chain simultaneously, and when they find a valid PoW they broadcast this block, which is only valid on a single shard chain.

An issue with this construction is that it relies on miners validating all shard chains, as only transaction sharding is used (i.e., transactions are segregated to a single shard chain), rather that state sharding. Given that all miners still need to validate all transactions, as they do in a traditional blockchain, it’s not obvious how this Parallel Chains construction results in increased scalability.

Graphene

Graphene is a method to increase the speed of block propagation in the Bitcoin network by using invertible bloom look-up tables (IBLTs) and bloom filters. It is currently in the process of being deployed on the Bitcoin Cash network. First, we will give an overview of both data structures and then we will go into the details of Graphene’s inner workings.

Bloom Filters

Bloom filters are a probabilistic data structure that tests set membership. They are heavily used in cryptocurrencies such as Ethereum. They can be used to easily prove that an item is definitely not in a set i.e., there are no false negatives, and parameters can be adjusted to minimize the rate of false positives i.e., there is a chance that when we test for “is this in the set?” we get “possibly in the set.”

More formally, a bloom filter consists of a bit array B of m bits and k hash functions. All of the bits of B are initially set to 0.

Image Credits: GeeksForGeeks

In order to add an item to B, hash the item using each of the k hash functions. The result will be the position in the bit array B to set to 1.

Adding “nerd” to `B`. (Image Credits: GeeksForGeeks)

In order to check if an item is in B, compute the k hashes with the item to check as input. If any of the bit at one of the position is 0, then the item is guaranteed to not be in B. Otherwise, we cannot conclude if the item is in B. But depending on how we have chosen k and m, we have a reasonable probability that the queried item is most likely in B.

Checking if “cat” is in B. Since checking for “cat” gives us positions of B that have 1, we cannot conclude that “cat” is not in B (obviously, we know that “cat” isn’t in B since we didn’t add it).

There are other variations of bloom filters that you can read more about here.

Invertible Bloom Look-up Tables (IBLTs)

Invertible bloom look-up tables (IBLTs) are a generalization of bloom filters applied to look-up tables. They give us a probabilistic way to store key-value pairs in a look-up table. They also have the nice property that if you have two lists that differ by at most 15%, using an IBLT for each list, you can recover all of the differing elements of those lists. Thus, the size of an IBLT is dependent on the size of the symmetric difference between both lists.

Invertible bloom look-up tables work as follows:

Let I denote an invertible bloom look-up table. Each row in the table contains the following: 1) a count of the number of items added to this cell, 2) the sum of the keys of the items added to this row, and 3) the sum of the values of the items added to this row. The word “sum” in this case simply denotes an accumulated value, not necessarily a traditional arithmetic sum. Key-value pairs, in the case of mempool synchronization, are transaction IDs (hashes) and transaction data. All the elements in all rows of the IBLT are initialized to 0.

To insert a key-value pair (k,v) into I, hash the key k using each of our several hash functions. This will return a list of rows pseudo-randomly. In the appropriate rows, increment the counter, XOR the current key sum with the key, and XOR the current value sum with the value.

To remove a key-value pair from I, do the inverse of the insert operation.

To determine if a key pair is in I, (and, more importantly, to determine the value of an element with a given key i.e., the transaction data from a transaction ID), hash the key k using each of our hash functions. If any of the rows in I have a key sum equal to the key and a count of 1, we know from any sequence of the above operations that the value sum in that row must to equal to the original value of the desired key-value pair. If all rows have a count of 0, then the key k is definitely not in I, and if no rows match the above condition then k may be in I, but we have no way of being sure or reconstructing its value.

If you want to dig into the details about this interesting data structure, you can read the original paper here.

Graphene Protocol

The gist is that Graphene minimizes the amount of data that needs to be passed between peers in the Bitcoin network by storing transactions in bloom filters and IBLTs. It relies on the assumption that most two peers in the Bitcoin network have extremely similar transactions in their local mempool (for non-malicious nodes this is empirically true). We use a bloom filter to minimize the symmetric difference between the transactions in a block and the mempool and use an IBLT to recover the errors from the bloom filter.

Let Alice be the sender of a block and Bob be the receiver of said block. Bob requests the block from Alice and sends the number of transactions in his mempool. Alice creates an IBLT I and a bloom filter B that contain the transactions included in the block and the appropriate Bitcoin header fields. Bob creates an IBLT I’ comprised of transactions from his mempool based on whether those transactions are potentially in B. Then, Bob computes the symmetric difference between I and I’ to recover all of the transactions. To ensure that all the transactions in I are recovered, Alice needs to set a reasonable false positive rate. The false positive rate of the bloom filter B is dependent on the number of transactions in the mempool, the number of transactions in the block, and the expected difference between the two sets of transactions.

If you want to read both papers in-depth, you can read the Parallel Chains paper and Graphene paper. If you like what you have read, you can subscribe to the newsletter.