Block propagation data from Bitcoin Cash’s stress test

TL;DR: During the stress test, blocks propagated through the non-China mainnet at around 300–1000 kB/s. This is pretty slow, and would cause problems with orphan rates if block sizes were frequently larger than 8 MB unless we improve our block propagation algorithms.

Introduction

On September 1st, 2018, Bitcoin Cash users performed a stress test in which they intentionally spammed the network in order to see how well our systems could handle the load.

But what use is a stress test unless you collect data? Shortly before the test, I asked people to configure their nodes to log extra data and to install NTP for more accurate timestamps. After the test, I requested users not concerned with privacy to send me their debug.log files so I could analyze the data for block propagation information. It was all rather last minute, but even so, I got data from 12 nodes.

Block propagation speeds are important to keep track of and to minimize because if they get too large, they can compromise the security of Bitcoin. The reason for this is the effects of orphan rate inequality. Bitcoin’s proof of work system was designed to ensure that every hash performed by a miner has the same expected revenue, regardless of whether the miner is small or large. That is usually true, but when block propagation delays get too high, this gets broken. No rational miner or pool will intentionally orphan their own blocks, and they will always receive their own blocks with a delay of zero. Consequently, a pool with 35% of the hashrate will instantly have 30% more support for their blocks than a pool with only 5% of the network hashrate, and will see a 30% lower block orphan rate as a result. This results in more revenue for larger pools compared to smaller ones when orphan rates are high, and will encourage all miners to join large pools. Ultimately, this can result in the network being comprised of one or two superpools which will have enough hashpower to easily perform 51% attacks. Even if these pools continue to act in good faith, this still compromises Bitcoin’s security, as it means that a hacker or disgruntled employee of that pool could compromise the security of the entire system.

In order to avoid runaway pool centralization, I suggest using a target of 1% for the advantage of a 35% pool over a 5% pool. This 1% figure encompasses the typical variation in fees between pools; differences smaller than this are unlikely to pose a problem. A 1% advantage would occur when orphan rates are about 3.3%, which would happen when the average block propagation delay is 20 seconds. It is my opinion that prolonged operation with propagation delays substantially higher than this — say, 40 seconds — would result in unsafe mining incentives that could result in runaway pool centralization.

With that in mind, I collected and analyzed data from the stress test to see how we measured up.

Methods

I analyzed this dataset to see when each block first arrived and was verified by each node in order to get an idea for how quickly blocks of different sizes propagated through the network. Specifically, I calculated how long it took from the time at which the block was first received by a node in our network until each other node received that block.

The dataset has several limitations. First, we only had 12 nodes, and none of our nodes were large miners, so the propagation delays we observed are missing the first few hops of the propagation process, and are probably an underestimate of the total propagation delay by a factor of roughly 2. I did not adjust my data with this factor, so readers should keep in mind that the full-network delay is likely to be about twice as high as the numbers reported here, and the speeds 1/2 as fast. Second, all but one of our nodes were non-mining amateur nodes, and were not necessarily performance-optimized. Third, most of our nodes did not successfully configure NTP in advance, causing the timestamps they recorded to have an offset compared to the true time. Fourth, none of the nodes in our dataset were located in China, so we have no information in this dataset about how the Great Firewall of China affects performance.

Although the general lack of mining nodes might seem like a fatal flaw at first glance, mining nodes and non-mining nodes run the same software, and if performance limitations are due to the algorithms and code instead of the hardware, performance will be similar between the two. As we will see later, it is indeed the case that our performance is mainly limited by code, not hardware, except for abnormally slow hardware.

I corrected for the timestamp offsets by subtracting the median delay for each node from that node’s timestamps. For example, if a node reported receiving 50% of its blocks five seconds after the first node in our network, my analysis assumes that this was caused by a clock inaccuracy and would subtract 5 seconds from all of that node’s timestamps.

There were a few issues with the dataset. One of the nodes appears to have had networking issues and reported frequent “PROCESSMESSAGE: INVALID MESSAGESTART” errors, and furthermore had inconsistent performance, so I excluded its data from the analysis. Two other nodes were excluded for having timestamp issues that were not easily correctable.

It was observed by Gregory Maxwell that Bitcoin ABC and SV had some code inherited from Bitcoin Core that artificially limited them to only propagate 7–14 transactions per second. This bug is easy to fix, but it means that the only large blocks in our dataset were ones that came with a long delay after the previous one. Consequently, our sample size for very large blocks (> 15 MB) is quite small.

Results

The full data can be viewed at the link below. Each of the middle columns corresponds to one node, and shows the delay (in seconds) before each block was received by that node.

https://docs.google.com/spreadsheets/d/1DJZX2gG6I4sE4FLRcuwR1A-R96lXwLjUDl7rUZVHmS8/edit?usp=sharing

If you prefer a plaintext (tab-separated value) version, you can find that here:

http://toom.im/files/stressdata.txt

The full dataset can be downloaded here:

http://toom.im/files/stresstest.zip

Block verification is much faster than block propagation. Most blocks were verified in less than 1 second by all nodes, and no block took longer than 4.2 seconds to verify even on the slowest node in our dataset. The fastest node in our dataset was able to verify a 21.4 MB block in 1964 ms.


2018-09-02 12:30:37 - Connect 95861 transactions: 427.79ms
2018-09-02 12:30:37 - Verify 98461 txins: 801.92ms
2018-09-02 12:30:38 - Connect postprocess: 1056.31ms [10.03s]
2018-09-02 12:30:38 - Connect block: 1963.69ms [27.94s]

As block verification did not seem to be a significant bottleneck, I did not perform further analysis on the topic. As a ballpark figure, block validation appears to take up 5% to 10% of the total block propagation time.

Several blocks took more than 100 seconds to reach all nodes. These anomalous delays were more common with larger blocks, like 00000000000000000069c3ad902b27621493eea7c3504d9a7a279b04c9dad565 (15.2 MB).

Block propagation speeds throughout our network averaged less than 1.1 MB/s of uncompressed block size for all block size tiers. Larger blocks generally made more effective use of available bandwidth, except for the largest tier (> 16 MB) which had a small sample size. As both Compact Blocks (ABC, SV, and XT) and Xthin (BU and XT) achieve compression ratios near 25x, this means that the average effective network bandwidth during propagation was 25x lower, and below 44 kB/s for even the best block size tier. (Note: this 44 kB/s represents the network bandwidth along one propagation path; as each node will be simultaneously uploading to multiple peers in parallel, the peak utilization of their network pipe will be higher than this.) These numbers are about 41% lower than the 1.6 MB/s and 62 kB/s seen during the Gigablock Testnet Initiative. This difference may be because the nodes we were measuring from were less powerful and less well-connected than the nodes in the GTI. The difference may also be because mainnet nodes have more peers than the GTI nodes did, and so their available bandwidth was spread out more thinly.

Average speed for blocks below     900 kB:   73 kB/s
Average speed for blocks of 900-1900 kB: 470 kB/s
Average speed for blocks of 1900-3900 kB: 760 kB/s
Average speed for blocks of 3900-7900 kB: 1010 kB/s
Average speed for blocks of 7900-15900 kB: 1023 kB/s
Average speed for blocks above 15900 kB: 662 kB/s
Average speed for all blocks above 900 kB: 717 kB/s

Propagation performance was broadly similar between the five nodes running ABC+SV (Compact Blocks) and the three nodes running BU (Xthin and an ureliable alpha version of Graphene). Propagation performance was about 2.5x worse for the two Bitcoin XT (Compact Blocks + Xthin) nodes. However, I caution readers not to judge XT harshly based on this, as it is unlikely to be an indicator of XT’s true performance. One of the XT nodes was running in a VM with only 1 virtual CPU for most of the experiment (the slowest in our dataset), and the other XT node had its bandwidth capped at 25 Mbps down/5 Mbps up (the slowest in our dataset).

Average speed for XT for blocks above     900 kB: 924.548 kB/s
Average speed for BU for blocks above 900 kB: 2518.830 kB/s
Average speed for ABC+SV for blocks above 900 kB: 2313.313 kB/s

One block was orphaned during the stress test. At around 6:53pm UTC, ViaBTC mined a 6029 kB block with hash 000000000000000000e28e540ea0819e83377f201802fd38fde4e7a2f4bda192 at height 546012. This block was slow to propagate through the network. Less than 60 seconds later, BTC.TOP mined a competing 51 kB block at height 546012 with hash 0000000000000000002d5be4e56f448a247088ee91e31f788d9a6ad5411f71b6. This created an orphan race. BTC.TOP’s block out-propagated ViaBTC’s block, and was the first to arrive at all of the nodes in our network. However, ViaBTC had enough hashrate and luck to be able to mine a second block after five minutes at height 546013 on top of their earlier block, ending the orphan race in ViaBTC’s favor. Even though the orphan race was caused by ViaBTC choosing to make a large block which was slow to propagate to BTC.TOP, and even though BTC.TOP’s block was the first to arrive at all other nodes, ViaBTC was able to win the race by sheer force of hashrate. This is exactly the scenario that can cause hashrate centralization if it occurs frequently.

Mark Lundeberg observed that in this situation, a rational selfish miner would borrow hashrate from their BTC pool to mine BCH until the orphan race is resolved. This appears to be a selfish mining technique which has not previously been published, and would increase the effectiveness of all selfish mining variants. We have no way of knowing whether ViaBTC actually did this or not, but they certainly have enough hashrate that they could have done so.

Conclusions

Our block propagation protocols need work. If block propagation for large blocks happens at 1023 kB/s, then our 20 second recommended limit would be reached with 19.5 MB blocks. Since we’re only measuring a subset of the propagation process, the true block propagation speed is lower. If it’s 2x slower, then blocks consistently larger than 10 MB would result in orphan rates above 3.3% and undesirable pool centralization incentives. Unfortunately, 128 MB blocks are not at all reasonable at this point in time. They would likely take around 250 seconds to propagate, which would result in orphan rates of about 34% and a 7–10% overall revenue advantage for large pools like CoinGeek.

Fortunately, we have a lot of upgrades in the pipeline that should help with block propagation.

Graphene without CTOR appears to be capable of achieving 100x compression, which is about 4x better than we are currently getting with Compact Blocks and Xthin. 86% of the data that Graphene without CTOR needs is for transaction order information, so if we fork in CTOR, Graphene will be able to get ~700x compression, or 28x better than what we currently have. However, the current prototype of Graphene is unreliable, and cannot transmit all blocks successfully. It remains to be seen whether Graphene can be made reliable enough to make very large blocks feasible.

Another technique which can help is Xthinner, a protocol I recently designed with LTOR in mind. Xthinner gets about 120x compression with CTOR, or about 56x compression without. While not as efficient as Graphene, Xthiner is much simpler and more reliable, and can serve as a fallback option for when Graphene transmission fails. In my next Medium post I will explain how Xthinner works in more detail.

In addition to techniques which provide better compression, we can also make use of more of our available bandwidth. We’re only getting about 44 kB/s per TCP connection right now, even on machines with 100 Mbps connections. While this might sound weird to non-network engineers, this poor bandwidth utilization is a well-known effect of TCP’s congestion control algorithm on high-latency, high-bandwidth connections with significant packet loss. We can avoid this issue by switching to a UDP-based protocol, preferably with forward error correction. Matt Corallo’s Bitcoin FIBRE is one such protocol. Although FIBRE relies on a trusted centralized network of relay servers, it does perform quite well. UDP+FEC could also be applied to Xthinner or Graphene to boost their effectiveness.

Much later, when we’re dealing with multi-GB blocks, we might start to saturate our network pipes, and we might need to add multicast UDP, switch to a protocol like Blocktorrent, or simply pay for larger network pipes.

One thing is clear from all of this: We need more data, and we need better data. We need to make sure that we regularly collect performance data on our network, and use that data to guide our decisions on what to spend time optimizing. Our community has spent a lot of time recently discussing the impact of CTOR on block validation, and that time has been largely wasted. Block validation speed may be important at some time in the distant future, but for now and the near future, the main bottlenecks are AcceptToMemoryPool performance and block propagation.

If we can get more accurate data in realtime, that will make the process of optimizing performance much easier. To that end, my brother and I are planning to add code (or offer patches) to Bitcoin Cash clients so that node operators who choose to can submit block propagation data and statistics in real-time to a central server, where the data will be displayed live via an interface based on our 2014 Blocksize Olympics visualization. This would serve a similar purpose to what ethstats.net does for the Ethereum community. This should give pool operators and other performance-minded entities a much better picture of how well their systems are performing and what they can optimize.

Like this article? Want to translate it into Chinese, Russian, or another language? Please contact me! — Jonathan