I have always found a lot of value in reading and recapturing the main ideas presented in the foundational papers. This post covers the original bitcoin paper. (Readers who want to go deeper: I highly recommend reading Merkle’s original paper on protocols for public-key crypto systems and Adam Black’s hashcash for proof of work details.)
In the recent past, money has been traditionally issued by a central authority. This helps ensure that money is valid and also that money is not double spent. When someone conducts some cash transaction, money exchanges hands and cannot be double-spent by the person who hands out the cash. In an online world, this burden is shared by a central authority — such as a bank, that ensures that a user can spend the given amount only once. If i have 10$ to spend, then i can spend them only ones. This is a straightforward problem to solve if you have a centralized database in a bank.
But then there is a centralized system that holds too much cash/power and can charge transaction fees for every transaction. The intermediary can also charge for foreign transactions, can enforce limitations on the size of transactions etc. Since certain transactions may need to be reversed, more trust needs to be introduced in the system — which means clients exposing more information about themselves to central banks.
This is the primary motivation of bitcoin network — can there be a network of trustless nodes that can help with peer-to-peer electronic cash transactions without having to rely on a central authority. The main problem to solve is obviously how to avoid the double spending of the same money/coin. In an online world, a malicious agent M can spend the same money to A and B, as long as A and B are unaware of each other and the transaction with M.
Lets define what such electronic coin might look like. Let’s say all owners have public and private keys. Each coin is basically a long list of cryptographically signed hashes — let’s call them Transactions. Every time a coin transfers ownership, the previous owner(owner1) takes the last transaction in the list, public key of the next owner(owner2) and signs it with it’s private key and appends it to the list of transactions on this coin.
Transaction(new) = Hash (Owner1-Private-key, Last-Transaction-On-This-Coin, Owner2-Public-Key)
Other nodes in the system can then verify that the coin was transferred to the new owner — by verifying the digital signature using the public key of the owner1.
Question of double-spend
How will Owner2 know that Owner1 didn’t perform the same transaction with OwnerX before this transaction? In a centralized system, this was easy. All transaction get recorded and hence the system can verify whether this is an attempt to double-spend.
In a decentralized system, only way to know about this is to be aware of all the transactions that happened before this transaction on this coin. So, the nodes in the bitcoin network need to announce each transaction(i.e. transfer of ownership) and agree on the order in which these transactions took place. So the owner2 can be confident that it was the first one to receive this this coin from owner1.
Building blocks of the solution
What would be useful to avoid double-spend is to have a time-stamping server. Timestamping server can say Coin 1 existed at a certain time and was transferred to another owner at another time. If this data was available to every one accepting the coin, then they can go back and confirm that there has not been an earlier spend on this coin by the current owner of the coin. First we can look at a centralized timestamp server and then go to a decentralized one.
Centralized Timestamp Server
Lets start with assuming that there is a centralized server that can digitally timestamp certain data block. Here time-stamping means: A data block will be assigned a digital timestamp that is unmodifiable and verifiable in future. A way to do this is create a hash of (the data, timestamp) using the key of the timestamp server. Anyone can then verify the data and the timestamp later on. Obviously this relies on a centralized timestamp server and its integrity.
One way to enforce some integrity on the centralized timestamp server is for it publish these hashes in the newspaper or some group often. This will ensure that the timestamp server can’t collude with a client to modify the digital timestamp of an existing document without users of the newspaper noticing. (Obviously this doesn’t work easily in a digital environment)
Another way to make modification of hashes harder is to create a chain of digital timestamps. So even if someone decides to change one particular timestamp, they would need to change the whole chain from there on. Here is a simple diagram of that:
Decentralized timestamp server
The last two approaches lead us closer to a decentralized solution.
- In a network nodes, announce transactions to the whole network
- Each new transaction, is built upon the previous one to make a targeted modification harder
- We can also make the production of a digital timestamp harder by introducing proof of work which takes a long time to produce and minimal time to verify, thus making modification of blocks even harder.
- A simple proof of work function is to produce a hashed-value that has its most significant ‘w’ bits set to zero. As ‘w’ increases, the complexity of the proof-of-work increases.
- Since it is very hard without brute-force to reverse the given one-way-hash, the only way to generate such proof-of-work is to take a nonce and keep on incrementing it, until its hash has most significant ‘w’ bits set to zero e.g. for a hash value, it would be something like:
check if most significant w bits are 0.
If yes we have produced a valid proof of work. Otherwise go back to the first step.
Figure of bitcoin blockchain looks like something like this:
Putting it all together
So what we have so far is:
- A bunch of transactions that pertain to coin transfers.
- Then we have a mechanism to club these transactions together into a block — using a Proof of Work mechanism
- Then these blocks are chained together to form a long list — essentially making a ledger.
Now let’s see how the network operates
- Transactions are announced publicly and received by all nodes
- Each node creates a block out of these transactions and each node finds a PoW. This is the process of starting the digital timestamping.
- Then a random node proposes it’s block and the PoW to the network
- Nodes in the network verify the PoW and then accept the block only if all the transactions in it are valid and not spent already. This is how order is enforced on blocks/transactions.
- Nodes then create the next block by using the hash of this block. This essentially announces the acceptance of the previous block that was sent out.
Double spend scenario explained
Another issue that still needs to be solved is how do we know that we are not using an already spent coin. Here is a figure explaining that scenario. A creates two transactions and then transfers the same coin C to both B and D.
Going by the general principle, nodes will trust the longer chain — Assuming majority nodes are controlled by honest workers, the honest chain will grow the longest and an attacker won’t be able to control a large portions of the blockchain.
In this scenario — there is no clear cut winner. Both blockchains are of the same length. Hence once way to protect against this is, for the owner B and D to wait for the chain to grow. Both can wait for more than x confirmations after their block was added. Assuming x is large enough, it becomes exponentially hard for both chains to maintain the same length. One chain will grow longer and the second chain(and its orphan blocks) can be invalidated. One key point here is that — from network’s perspective, there is not real right or wrong transaction here. It is really a matter of having a consistent order of transactions in the network.
Incentives to run nodes in the network
There are two major incentives to keep the network running.
- Each miner(node owner) gets to include a new transaction when it creates and proposes a block that gets included in the final long chain. This is the first transaction in the block and essentially bootstraps the creation of coins.
- Once all coins have been minted, transaction fees will come into play prominently. A node running the network can propose to the sender what are the transaction fees. Sender of the coin has to agree to pay that transaction fee to the entity running that node. So the balance ob block gets reflected as: Input value = Output value + transaction fees
Merkle trees for efficient disk storage and membership
Due to the decentralized nature of the bitcoin, every node needs to keep the ledger of all the transactions and blocks. The log would be super big if all transactions were kept around all the time. Hence the network uses Merkle tree structure to arrange transactions in the given block. Let’s see how Merkle tree looks like and then we can see how it can be utilized. Here is a Merkle tree from the original paper:
As one can see, here are 8 transactions 1 to 8 represented using leaf nodes y1, y2 and so on. All the intermediate nodes are hashes of their child nodes. If someone decides to check for membership/validity of y5, then all they need to have is all the hashes from the H(1,8,Y)-H(5,8,Y) — H(5,5,Y). They do not need the whole tree i.e. all the transactions from 1 to 8.
Another nice property of this tree is that modification of any node, would need to propagate all the way up to the root and redoing of all the hashes. So it is not trivial to modify H(6,6,Y). The attacker would then need to modify H(5,6,Y) and then H(5,8,Y) and then the root node.
Using these properties of the Merkle trees, once a transaction has been reenforced my multiple blocks after it, we can remove some of the branches of the trees. Eventually a lot of branches can be removed and all the remains is the root signature in a block. This reduces the disk usage significantly.
How to confirm whether the transaction is valid
Bitcoin ledger is represented as a state machine and not as a database of accounts. So every new transaction takes as an input, the output of some previous transaction. So let’s say A is using coins that were given to it in transaction y8. Lets’s say A wants to give this to B in the current transaction is y15. To verify whether the output of y8 wasn’t spent before y15, we need to go back and traverse y15 to y8 and confirm the validity. Again Merkle trees come in handy for membership and we don’t have to check every transaction in between.
Probability of dishonest chain surviving
In the network, what an attacker will most likely try to do is to double spend and use the second spend to get back the money that was previously spent. (Stealing money from someone else means getting hold of their private keys. Creating money out of thin air is not possible without proof of work.) In such cases, the attacker can only have the dishonest chain become the longest chain, by controlling the majority of the network(remember that a random node in network gets to add the next block). If majority is controlled by honest nodes, then this apparently translates to exponential(z) probability if the dishonest chain is behind by ‘z’ blocks than the honest chain.
Genius is really in simplicity — this paper references only 8 other papers which is astounding given the impact that this has had. I found it hard to visualize the bitcoin ledger without the traditional owner-account-value view. But the State-transition view explained in some other places made it much easier to read the paper. Another paper where distributed consensus formed a foundation of a much bigger technology(but in adversarial scenario). This paper also references Lamport’s paper on distributed consensus(covered also here). The idea of Proof of Work to land on a probabilistic solution for Byzantine General’s problem is also very neat!