This is only concerned with scaling on layer 1 of the network. Second layer scaling solutions is another topic entirely.
Scaling, one of the more popular words in the greater blockchain buzzword bingo that has a lot of people demanding it without many understanding the intricacies involved in successfully implementing a lasting solution. Scaling is not as simple as increasing block size and quickening block generation. Transactions have weight, require computation, need to be communicated through all nodes and the blockchains they are included in require storage in their ever growing state. Scalability is therefore heavily dependent on how the specific blockchain handles transactions and node communication.
In general, one of the core concepts of cryptocurrency, trustless verification, causes the most trouble for scaling. In all blockchain protocols, each node stores all states with account balances, contract code, and entire transaction history, in order to trustlessly verify a transaction’s validity for each transaction in the network. This provides a large amount of security but limits scalability to not being able to process more transactions than any single node in the network can. This is the biggest variable in setting the ceiling for transaction volume at ~3–7 transactions per second for Bitcoin or 7–15 TPS for Ethereum. Visa processes ~2,000 transactions per second and an IoT network with an estimated 50 billion devices could produce several times that. That’s a large gap.
So if we want to expand into the future and create the real world usable networks that we all envision them to be, we have to find other ways to scale. Second layer solutions that stretch each of those transactions to the settlement of larger groups of transactions, like Lightning, are a valid path and one that even Satoshi conceived for Bitcoin.* But more can be done to the first layer to establish it as more able to scale with higher throughput to push out the need for more extreme first or second layer solutions or make them a natural evolution within the network.
Defining the Problem
We are going to talk about the problem in terms of Bitcoin pre-SegWit as this complicates things and turns the discussion from easy to understand block size to less easy, block weight
What prevents a network from processing more transactions? There are two types of constraints: physical resource constraints and software constraints. Data communication is limited by the speed of light, bandwidth determines how much data can be sent in time, CPUs limit how much processing can be accomplished, blockchains grow and after all of that, the network has to remain secure and resilient against attack.
At the very base of the problem, we have network latency, how long it takes for data to travel. This is dependent on the size of the transaction but in most global networks this is about 2–3 seconds. Currently, transactions are about 500 bytes for Bitcoin and 150 bytes for Ethereum so if we want to compete with Visa the network must be able to communicate at 8 Megabits per second. This is relatively common Internet speeds for today.
At this theoretical rate of transactions that means the nodes in the network must be able to process 500 bytes x 2,000 tps = 1 MB amount of transactions per second. Processing a transaction involves hashing and ECDSA signature verifications. RIPEMD-160 and SHA256 (both hash algorithms) run at about 100 megabytes per second, so 2,000 transactions could be processed in about 10 milliseconds, so fast enough that we don’t need to worry about it.
A Bitcoin node does do other things above verifying transactions, but this process takes up the majority of the processing power required to run a node.
Ethereum is a little different in that it doesn’t have a traditional block size but a per block gas limit. Gas is the payment unit used to pay for the computations required to process the transaction. This is a consistently reevaluated value based on the current processing, storage and bandwidth conditions of the network. This is important because not every transaction is just used for transferring an asset but may be used to signal a contract or carry data to it which requires the network to do some level of extra computation.
The only resource concern then that increasing transaction rate brings is increased requirements for node storage which may lead to a centralization of nodes based on higher cost of equipment. If the rate of transactions increases by way of block size increase or block generation rate, the blockchain would grow at a corresponding rate. Currently, the Bitcoin blockchain grows at a max rate of about 1 GB a week (max rate of about 4 GB for SegWit enabled Bitcoin). If the block size is doubled, that rate would be 2 GB a week, tripled it would be 3 GB a week and so on. (Imagine if Bitcoin Cash, with its 32 MB blocks, were filled. The blockchain would grow at a rate of 4.5 GB per day.) If the block production rate increases, this would have the same effect. This poses a node centralization risk if this is done too early putting the node storage requirements out of financial reach for common people. So while it is something to keep in mind in terms of the implementation timing of certain scaling techniques in relation to technology cost, it is not a base concern.
As we saw above, all the operations of processing a transaction, from the propagating through the network, to the hashing and verifying, do not pose any real physical resource constraints on transaction throughput in our current networks. What does currently stop us from going faster is the size of the transactions, the size of a block (how many transactions you can fit in the block), how often a block gets added to the chain and the mechanism by which the nodes collaborate to add transactions to the chain.
Anatomy of a Transaction
Bitcoin and Ethereum transactions are fundamentally different but the data within is relatively similar. In a UTXO framework like Bitcoin, the transaction has two sections, the input data, and the output data. In an account based network like Ethereum, the transaction is not dependent on proving you possess the private key to spend an output but in having enough balance in your wallet for that transaction. This makes Ethereum transactions quite a bit smaller than Bitcoin transactions and fairly consistent in their size. Bitcoin transaction size is heavily dependent on the number of inputs and outputs included in the transaction.
In the most simplified Bitcoin transaction, pay-to-public-key-hash, where you are sending as an output the entirety of an input with no change address required, there are two main chunks of data, the input and the output. For each of those we will need the signature, the public key, the previous unspent output, the new output public key and the amount (with some little tidbits here or there). This looks like this (though not this nice):
<Unspent TX Output>
19 76 a9 14 dd6cce9f255a8cc17bda8ba0373df8e861cb866e 88 ac<Sequence>
<Tx Amount in Little Endian>
19 76 a9 14 a2fd2e039a86dbcf0e1a664729e09e8007f89510 88 ac<Lock Time>
<Hash Code Type>
and without the nice titles and spacing:
01000000 01 be66e10da854e7aea9338c1f91cd489768d1d6d7189f586d7a3613f2a24d5396 00000000 19 76 a9 14 dd6cce9f255a8cc17bda8ba0373df8e861cb866e 88 ac ffffffff 01 23ce010000000000 19 76 a9 14 a2fd2e039a86dbcf0e1a664729e09e8007f89510 88 ac 00000000 01000000
Once all of that is set, we have to sign the transaction that shows that we own the address of the output included in this transaction. This consists of a DER encoded signature generated with the private key and the corresponding DER encoded public key. We then use this to replace the
<scriptSig:scriptPubKey> portion of the input, and completes our transaction for .00128307 BTC:
be66e10da854e7aea9338c1f91cd489768d1d6d7189f586d7a3613f2a24d5396 000000008c 49 3046022100cf4d7571dd47a4d47f5cb767d54d6702530a3555726b27b6ac56117f5e7808fe0221008cbb42233bb04d7f28a715cf7c938e238afde90207e9d103dd9018e12cb7180e
01 41 042daa93315eebbe2cb9b5c3505df4c6fb6caca8b756786098567550d4820c09db988fe9997d049d687292f815ccd6e7fb5c1b1a91137999818d17c73d0f80aef9ffffffff
19 76 a9 14 a2fd2e039a86dbcf0e1a664729e09e8007f89510 88 ac
01000000 01 be66e10da854e7aea9338c1f91cd489768d1d6d7189f586d7a3613f2a24d5396 00000000 8c 49 3046022100cf4d7571dd47a4d47f5cb767d54d6702530a3555726b27b6ac56117f5e7808fe0221008cbb42233bb04d7f28a715cf7c938e238afde90207e9d103dd9018e 2cb7180e 01 41 042daa93315eebbe2cb9b5c3505df4c6fb6caca8b756786098567550d4820c09db988fe9997d049d687292f815ccd6e7fb5c1b1a91137999818d17c73d0f80aef9 ffffffff 01 23ce010000000000 19 76 a9 14 a2fd2e039a86dbcf0e1a664729e09e8007f89510 88 ac 00000000
You can see that once the transaction is signed, the signature is about 65% of the transaction data. This would be a good place to work in order to reduce transaction weight. This is in fact what SegWit does, it segregates the witness (signature) from the transaction data, reducing the transaction weight by 65%, by moving it to a new merkle tree in the coinbase transaction. This along with changing how the block size is evaluated, effectively quadrupled the size of the block.
This transaction format also grows rapidly with the number of inputs and outputs. If the transaction requires two inputs (to cover the amount of BTC sent) and two outputs (one for a change address), as is most common, the transaction size doubles. Larger transactions of larger sums including a large amount of smaller inputs would then become quite heavy. In fact, this is an effective way to attack the network and clog blocks. This was, arguably, what occurred during the scaling debate where Bitcoin Cash tried to overthrow Bitcoin for the top spot. Transactions were sent for relatively small amounts containing many inputs and therefore creating transactions so heavy the number of transactions that could be included in a block was significantly reduced. In several instances, this took the average number of transactions in a block from over 2,000 to under 100. This is currently still an issue but one that has a proposed solution.
Schnorr signatures is a proposed change to Bitcoin’s current Elliptic Curve Digital Signature Algorithm that essentially allows the combination of keys to provide an aggregate signature for many inputs. This would reduce the need to provide a signature for every input to a combined signature for all inputs, thereby reducing the weight of the transaction and nullifying the attack vector.
Ethereum transactions are much simpler because of it’s account/balance framework. In Ethereum, you do not have to provide inputs to supply your output, you just have to provide:
Then sign it by using your private key. As you can see, since the nodes maintain a state database of accounts and their balances, transactions are quite a bit lighter and of consistent size, assuming they aren’t passing a significant amount of data for a contract execution.
So, we are fairly limited to how light we can make transactions. An irreducible amount of data has to be passed and there is only so much compressing of the signature that can be done. We therefore only have two more recourses to increase the throughput of transactions through the network, increasing the size of blocks and how often they are generated.
Block Size — Is Bigger, Better?
Increasing the block size is the next logical step in increasing the number of transactions through the network. Doing so is straightforward in its implementation but it does have some side effects.
The most obvious of issues that arise from increasing the block size is the increased storage requirements for maintaining a full node. In the case of Bitcoin Cash, which recently increased their block size to 32 MB, the blockchain will grow at a rate of 4.5 GB per day (assuming all blocks are full, which they are not, they average 50 kB). This could push the costs required to maintain a full node outside that of a dedicated hobbyist, further centralizing a network. Again, this concern is one of timing with the cost of technology and less of a concern in security.
What is of significant concern is forks caused by block distribution time. In typical PoW frameworks, where computers compete to produce valid blocks, all nodes build upon the latest valid block in the chain. Once the newly formed block is created, it needs to be distributed through the node network and the time it takes for that block to be communicated can become an issue as blocks get bigger.
The impact of this was investigated by Sompolinsky et al. (2013). They found that the average block propagation time was 2 seconds plus .08 seconds per kilobyte in the block (this has since been improved to .008 seconds per kilobyte due to transaction propagation improvements) [Fig. 1]. This would mean for a full 32 MB block, it would take 258 seconds (4.3 minutes, nearly half the block time) for that block to propagate to the edge of the network, meaning the nodes nearer the center of the latest block discovery would have a tremendous head start in generating the next block, further centralizing an already less-than-decentralized network.
They also found as a side effect of this that as the blocks got bigger, and the time it takes new blocks to propagate through the network increases, as does the probability of a node producing a block before it learns of this new block. When this happens, the network forks by having two blocks at the same block height. To the nodes in the network, there is no one chain longer than the other, so they coexist, building transaction histories of their own. The amount of time it takes to rectify this split depends on the amount of computing power dedicated to each fork. The first to become the longest chain (and therefore has the most hash power behind it) becomes the main chain, thereby discarding all blocks that don’t fit into the winning chain’s transaction history. These unaccepted but valid blocks are called several names: stale, orphans or uncles. This may be one block, or it may be many, depending on how long the contentious fork has existed. This also means that any transactions in those blocks and any block rewards for mining them are also rolled back as they no longer exist in the accepted history. You can see why this is a problem. Imagine being transferred bitcoin and watch it get confirmed, only for that transaction to disappear.
This has a direct impact on the security of the blockchain itself. 51% attacks require the attacker to have more than half the network’s hash power in order to successfully overtake the longest chain and initiate a double spend but this assumes the propagation time of the latest valid block to be negligible compared to block time. If block size becomes too big, the time it takes to propagate more than 50% of the node network becomes comparable to the block time itself, the probability of another alternate chain replacing it becomes higher, thus impacting the finality probability of each confirmation. 6 confirmations will no longer be enough to probabilistically guarantee finality.
So as the blocks grow in size, so does the propagation time and thus the stale creation rate in the network, creating forks, potentially rolling back transactions and increasing the risk of a double spend attack from increasingly weaker attackers.
To mitigate these issues, instead of having one large block every 10 minutes, why don’t we speed this up and transmit the same amount of data per time in smaller, more manageable chunks by speeding up the block time?
Block Time — What About Faster?
It turns out, speeding it up isn’t a simple solution either. As the PoW block generation process is sped up, the time in which these valid blocks are transmitted through the network becomes relevant, even for smaller blocks. The impact is exactly the same as increasing block size.
Using the same logic from above, each second elapsed before a node in the network learns of the latest block, the chances that node produces a stale block increase proportional to the time elapsed and the block time. For Bitcoin, with its 10 minute (600 second) block time, if it takes your node 5 seconds to learn of the new block then the probability your node would produce a block that is not canonical to that new block during that time increases by 5/600 or 0.8%. A small increase, sure, but it is non-zero. In the current Bitcoin network (post-SegWit), with a full block (~4 MB) it would take roughly 35 seconds for that block to propagate the network, so the probability the nodes on the edge of the network will produce a stale block increases by 35/600 or 5.8%. No longer negligible and in fact, these forks are happening all the time, as shown by Decker and Wattenhofer in their 2013 study in information propagation in the Bitcoin network [Fig. 2].
So making the block time faster is much like making blocks bigger, increasing centralization and decreasing security, impacting finality of transactions and overall reducing the security of the network, opening it up to double spend attacks by increasingly weak attackers.
The Core of the Issue
So while we might be able to squeeze out a little more throughput by manipulating these variables, this will only ever get us so far, and nowhere near Visa level outputs.
Ethereum is a good example of this, with it’s 15 second block time and 90 kB (adaptable) max block size. That is about 3.6 MB of blocks every 10 minutes, about the same as current SegWit Bitcoin! And even this only allows for about 15–30 transactions per second.
Wait wait wait…
So how then is Ethereum able to get by with it’s 15 second block time? Their stale rate must be through the roof! Well, it is. Using the technique above, 15 second block time with an average block size of 25 kB would mean that you would have a max propagation time of about 3 seconds which would mean that the network has a stale generation rate of about 0.2 stale blocks created per block accepted (about 1200 stale blocks per day! That would be a lot of wasted energy). This is in line with the data we currently see in the network also. Fortunately, they thought about that when designing it and made it a feature to make their blockchain more secure. They implemented a novel technique for dealing with decreased security and centralization risks that come with this block generation speed. “Greedy Heaviest Observed SubTree” (GHOST) was proposed by Sompolinsky et al. (2013) when investigating possible scaling techniques to use in Bitcoin. It changes the definition of the main chain from being that of the longest chain of valid blocks to being the one with the most stale descendants (heaviest) of the block’s ancestor as a way to calculate which blocks have the largest total proof-of-work backing it. This fixes the problems associated with stale generation and forking, rewarding the inclusion of them in the blockchain and not wasting the energy used in creating those stale blocks.
And here we come to the absolute core of the issue, the single weak point in the whole framework. Throughout this article there has been one main theme. We can manipulate transaction size, block size and block time, but who processes it? The node. At the end of the day, only one node generates any given block. As long as every node independently verifies the entirety of all transactions in the network, the network can only process as many transactions as one node is capable of. In a trustless and decentralized network, that number will always fall short of the goal.
So what now? The only way to scale to this level of transaction throughput is to divide the burden of verifying transactions in the network between nodes. In PoW networks, there is no way to separate the network.† As long as there is some probability that the chain will switch over to another alternate chain, reverting transactions, every node has to verify everything. If there is no chance transactions of a certain age will be reverted, there is no reason to have nodes maintain a complete history, just a state of accounts.
As we saw in Part 1 of this series, the only way to accomplish this is by using some form of Practical Byzantine Fault Tolerance or PoS as these frameworks enable explicit finality, opening up the possibility of having nodes do less work to verify transactions.
So in this sense, practical global scaling is a function of finality. This is why PoS is the future of blockchain networks.
†The only way to do this is in PoW networks is to have side chains that utilize the main chain as the settlement layer, turning one main chain transaction into the verification of many transactions. This is what Lightning Network does for Bitcoin. The side chain verifies batches of transactions then sends the pre-verified batch as a single transaction to be included in a block. This limitation is because of the probabilistic nature of finality in PoW networks.