Bitcoin white paper explained (PART 3/3)

optimizations and conclusions

This is a continuation of Part 2 where we analyzed the original bitcoin paper until 7. Reclaiming disk space. Let’s take it from there.


7. Reclaiming disk space

Each transaction size varies depending on the number of Inputs/Outputs. A good rule of thumb for transaction size calculation is:

n_inputs*180B + n_outputs*34B + 10B + (+-n_inputs)

According to tradeblock.com the average transaction size since 2014 is around 520 Bytes. There are around 277 Million transactions for that period. Excluding the size of the blocks it would mean that storing all these transactions would occupy ~144 GB. In reality they occupy ~150GB. Quite a lot if you want to interact within your smartphone!

The block header with no transactions occupies ~80 Bytes. The number of blocks for the same period is ~240,000 which occupies a total of 19MB. Such size sounds much more competitive.

How could we get rid of transactions while preserving correctness?

If we are able to reduce all transactions (Tx0 to Tx3) to a single hash (root hash) we would save the mentioned disk space without breaking the blocks hashes.

For that, we must be able to relate all transactions to each other, which we can do efficiently through a Merkle Tree.

Merkle tree

If you are familiar with binary trees this concept is trivial. The leaf nodes (Hash0, Hash1, …) are a hash function applied to some data (in this case Tx0, Tx1, etc…). Upper nodes (Hash01, Hash23) are just a hash function applied to the concatenation of their respective children hashes. The Root Hash is the upper-most hash which is included in the block header and ensures what transactions are present. Why don’t we just concatenate all hashes one after the other and produce the root hash out of that string? Well, if we wanted to validate if a transaction is part of a block which has many transactions, a Merkle Tree would allow us to do it in logarithmic cost compared to getting all hashes.

Can we verify/do payments without a copy of the transactions?

8. Simplified Payment Verification

Supposing we have a thin client which doesn’t have access to the transactions but only block headers, we must find a way to validate that a specific transaction is valid.

Such a node can get the longest proof-of-work chain at the moment (from several thick nodes) and request the Merkle Tree branch where the transaction is supposed to be.

Once the node knows the transaction is part of the chain and that there are blocks after it he can conclude it was a valid transaction as it was accepted by other thick nodes.

As long as we are asking honest nodes this scheme will work. If that’s not the case Satoshi proposes an alerting system whenever an invalid block is detected by some node.

Check a thin client (bitcoinj) security model here

9. Combining and Splitting value

As described in Part 1 a transaction isn’t referring to 1 input — 1 output exclusively, there might be more. A simple example is when Bob has 1 bitcoin and sends 0.5 to Alice. That transaction has an input of 1 bitcoin and two outputs of 0.5 each (one for Alice and the change for Bob, otherwise he would loose it as transaction fee). When a transaction depends on many transactions and those depend on others we don’t have to worry about traversing the hole tree as the latest transaction outputs are an aggregate of how much coins are left.

10. Privacy

Privacy model in Bitcoin

Traditional banking achieves privacy by limiting access to what transactions are going on and the parties involved. As we have mentioned, bitcoin has to announce publicly every transaction so this scheme isn’t possible anymore.

Privacy can be achieved by keeping public keys anonymous and/or using addresses instead. A good practice is to use a new key pair / address for each transaction to make it harder to link coin movement to a common owner. Still there is a risk for multiple input transactions as you could think that the coins are coming from the same source, and when revealing one address you could infer the owner of the others.

11. Calculations

If we suppose an attacker could get more power than the honest nodes he can alter the chain. He can’t alter it in any way he wants, as honest nodes wouldn’t accept an invalid transaction/block (such as sending other people money to himself or creating money out of thin air).

The only option is to alter the outputs of his latest transactions or revert them (the bigger the chain after the transaction the more proof of work is required to generate the longest valid chain).

What are the numbers? The math behind the results shows that the probability of the attacker catching up decreases exponentially the more blocks are confirmed:

P = probability of the attacker to catch up (0.1%)

q = probability of the attacker finding next block (10%, 15%, …)

z = number of block confirmations

These numbers tell us the that the more CPU power an attacker has (q) the more confirmations we have to wait (5, 8, …) to know that the probability of the attacker catching up with the chain will be < 0.1%.

This sounds like the chance for an attack is pretty low given the constantly growing size of nodes within the network but remains a risk for newly created PoW based chains.

Check bitcoin weaknesses if you want an extended overview of what could go wrong.

12. Conclusion

The paper proposes an electronic transactions system that relies on distrust. Ownership is proven by digital signatures while double-spending is mitigated through the PoW-based P2P network.

All rules and incentives are enforced within the network consensus. Bad actors are penalized while honest ones are rewarded.


Bitcoin is based on distrust although most users still rely on 3rd party services such as Coinbase. It is a distributed network based on inefficiency, which limits the transactions per second that can go through. High transaction fees slow down adoption. Many limitations surround BTC.

Even so it has given birth to more than 100 new chains, a hole new set of use cases, software paradigms and challenges. No one could have seen that we would have videoconferences over 4G while TCP/IP was being developed.

Clap if this helped and follow me on Twitter @sgerov!