Blockchains Demystified

14 min readSep 10, 2018

It is summer 2018, and there has been much hype about bitcoin, blockchain and cryptocurrencies for a while.

While bitcoins and cryptocurrency are interesting to many, I was primarily intrigued by the blockchain concept, and the impact it is having on the fields of distributed systems, computer science, and in databases.

In this document, I have attempted to do two things, (a) explain as “simply” as possible the various concepts of blockchains with examples, and (b) illustrate the challenges and opportunities in the space of blockchains.

Introduction

We consider the original bitcoin paper “Bitcoin: A Peer-to-Peer … “ by Satoshi Nakamoto [1] as one of the seminal papers published in the area of computer science. While the paper describes the concepts very elegantly, and crisply, it is a difficult read, especially for folks who are not formally trained in the areas of security, cryptography and computer science. It is not surprising that it will take a few reads to get the essence, if not a deep understanding, of the concepts.

Let us motivate bitcoin and blockchain using an scenario, which we will extend it as we go along in the text.

Scenario 1
Alice has a account with Acme Bank in Palo Alto. The account has a non-zero balance. Bob also has an account with Acme Bank in Palo Alto. Acme Bank in Palo Alto has a computer system which maintains the current balance of every account in that bank office. It also maintains a ledger or log of “approved” or committed transactions that have occured in the bank since inception.

In Scenario 1, if Alice decides to transfer $5 to Bob, the computer system will first check if Alice has the required amount as balance, in this case $5, and then complete the transaction.

Completing the transaction would be

to record the transaction permanently in the ledger,
then subtracting the $5 amount from Alice’s current balance,
while adding the same amount to Bob’s.

So far so good. This represents a very simple use case.

Digital Signatures

Each transaction in the ledger should be in-corruptible. That means no one in the bank or otherwise should be able to modify the transaction once committed. Also, someone other than Alice should not be able to originate a transaction on her behalf. How can that be achieved?

Scenario 2
When Alice decides to give $5 to Bob, she writes a check with her signature which represents the transaction. In this case, the uncorrupted check verifies that Alice intends to give $5 to Bob, and the unique signature of Alice ensures the identity of the originator.

It is important to note that, when the transaction is represented via a set of bits stored in a ledger, the data can be easily changed by an adversary.

So to solve this problem, we use a “digital signature”. Alice generates a digital signature, with her “private” code or key (something that is only known to Alice), on the whole contents of the transaction, which contains the name of Bob and the amount of $5. Any office location can verify or validate the signature using the “public” key of Alice, which is known to all locations.

This process has the following properties:

Because the private code is known only to Alice, none other than Alice can produce the exact signature.
If someone tries to change the contents of the transaction without Alice’s private key, with some rogue private key, then the signature verification with the public key of Alice would fail. Thus, any office can quickly verify and refuse to honor the check.

This makes the transaction, recorded on behalf of Alice into the ledger, immutable unless Alice has had her private key stolen.

Double Spending

Scenario 3
Let us say Alice decides to give Bob AND Carol $5 at almost the same time. However, say Alice has only $7 as her current balance. You can quickly see that only one of the transactions can be honored by Acme Bank, to prevent what is known as the double spending (overdraw) problem.

With a single computer system (or node as we start calling it) in the Acme Bank office of Palo Alto, this scenario can be addressed in many ways. In the simplest form, the node could decide to process the transactions in the order in which they arrive. Or, the bank could decide to toss a coin and randomly pick the order of the transactions. Or, could use some other mechanism to decide.

Irrespective of the conflict resolution choice, only one of the transactions to either Bob or Carol would be completed. And the other would fail, and would not be recorded in the ledger. This will allow us to maintain the integrity of the balances of the accounts of Alice, Bob and Carol.

Consensus

Consensus is agreement among all the participating entities about a decision or transaction. In our examples so far, there has been only one computer system or node. And in such cases, reaching consensus is simple, as there is only one participating entity.

Scenario 4
Let us say that Bob operates his account from the Mountain View location of Acme Bank (and not from Palo Alto). This is an scenario of a partitioned or sharded system, since the accounts stored and managed in the two offices are mutually exclusive. (The Bitcoin system is NOT this example, so bear with me)

As you can see in above scenario, if Alice wants to send $5 to Bob, we need to “coordinate” the transaction between two computer systems or nodes. How would we do that?

Various schemes have been proposed to coordinate consensus for distributed transactions. Here we explain the simple steps in a protocol called “2-phase commit” [2].

There is a “central coordinator or leader” who sends the transaction to all participating nodes. And, asks for their vote.
Each node does a local check for double spending or any other constraints, and respond with a “yay” or “nay” decision to complete the transaction.
If all nodes agree to complete, the coordinator sends the final message to the nodes to complete the transaction.

How do we choose a coordinator or leader? Acme Bank could set up a separate node that acts as the coordinator for these types of transfers, or one of the offices, either Palo Alto or Mountain View could act as the coordinator. There are various mechanisms for choosing the leader and reaching consensus. And, a large body of work to address challenges in these protocols, which include cases such as messages being lost or nodes (including the coordinator) failing.

Replicated State

In scenario 4, we saw that Acme Bank had the accounts data partitioned across their various offices. The union of the account states across the offices would provide for the complete view of ALL accounts that are managed by Acme Bank across their offices.

Scenario 5
Rather than a partitioned view, Acme decides to maintain one centralized database of all the accounts on a central computer system. As you can imagine, such a scheme would require all money transfer transactions to converge at this centralized database and thus limit scale and throughput for Acme. But more importantly, this scheme exposes a single central point of vulnerability to security. An evil Bob could hack into the bank’s central computer system and update the accounts and ledger to reflect a large balance in his account.
Scenario 6
Instead, let’s say that Acme Bank decides to deploy a full replica of the computer system with ALL accounts in every office location. This scheme seems to have two advantages. An evil Bob from scenario 5 would need to change every copy which may prove to be hard. Additionally, every office can now process and verify double spend locally. However, from scenario 3, Alice could potentially submit her request to transfer $5 to Bob and $5 to Carol, almost simultaneously, at two separate offices of the bank. Thus, double spending the $7 that Alice has with Acme Bank.

As you see from scenario 6, we still need a consensus protocol across the various offices of Acme Bank to ensure that Alice is not double spending. A simple scheme would be:

Elect a leader or coordinator for the transaction
Coordinator selects a set of nodes that need to agree/promise to commit the transaction
Coordinator collects votes from selected set
If quorum of selected nodes agree, then coordinator asks all nodes to commit
When selected nodes commit, they broadcast to all nodes in the network

Paxos [3] and other quorum based protocols are proposed solution to solve this.

Peer-to-Peer

So far we have seen that Alice and Bob both have accounts with Acme Bank, and in scenario 4 we had them operating in two offices of the same bank in close physical proximity.

Scenario 7
Imagine the scenario where Alice and Bob are in two ends of the United States, and are banking with two separate banking corporations. The banking system in the United States, with the backing of the Federal Reserve, enables Alice to send $5 to Bob. There is typically a “trusted” central intermediate authority, and based on the protocols established it takes a “few” days for the transaction to complete. Similarly, say Bob banks with Globelex Bank from Switzerland, our transfer of money from Alice to Bob would now cross bank, state, and country boundaries.

Traditionally, each boundary is governed by a central authority that acts as a trusted intermediary. And, each intermediary will add to the transaction overhead, as cost and time. To overcome this drawback, one option is to treat all computer systems of banks and offices anywhere in the world as peers, each of equal authority. That is any peer can be elected a leader.

The challenge is (a) how do we bring agreement and consensus, and (b) how do we bring trust, between these peers.

Byzantine failures [4]

In simplistic distributed systems, it is typically considered that the entities that are participating are behaving “rationally”. In the real world that is not true. In un-trusted peer-to-peer networks we have to assume that the nodes will behave irrationally.

Scenario 8
Consider the various ways in which an office location, compromised by an adversary (bank thieves, hackers or even governments), could participate in a consensus protocol:
- willfully decline responding to agreement requests of transactions
- lie about the agreement (e.g. say yay, but do a nay)
- manipulate or change the transaction while recording it (e.g. Alice gives $2 to Bob instead of $5)
- go back in time and modify previously recorded transactions

All these cases need to be addressed by a consensus protocol that works in a decentralized trustless peer-to-peer network.

Blockchain Ledger

Lets propose a consensus mechanism to the history of transactions across the various banks and locations/offices.

Scenario 9
Continuing scenario 6, let us establish a process at the two offices of Acme Bank. As the various transactions get submitted in the office, they get recorded on a temporary page (or block) of the ledger. While these transactions are locally verified in the office as not double spent (scenario 1), there is a possibility that temporary proposed blocks from two offices could have conflicting double spent transactions.
Scenario 10
Let us establish additionally that for each page (block) of the ledger, we randomly pick an office location that can be the leader/decider of the page. This leader broadcasts the proposed block with its locally verified transactions to all offices, who then locally commit the same transactions. (And, will potentially undo any conflicting double spend transactions they have seen locally). The global shared ledger book is the same across the offices, and made of proposed pages coming from various the offices.

As you can see, from above examples, it is possible to construct a shared globally consistent ledger with blocks of transactions. However, there are a couple of challenges that need to be addressed.

First, the leader needs to securely sign (formally via a hash) the proposed block, which includes data from all the transactions in the block, so that any honest office can verify to ensure that no transaction entries on this block has been manipulated. Second, in order to bind the ledger pages together, in each block we include the hash of the previous block. This forms a chain of “legitimate” blocks, or blockchain, representing a shared global ledger, that has the same sequence at any bank office. If a dishonest adversary (e.g. bank thieves, employees, hackers or anyone else) decided to replace a page, then the chain would break and would not be accepted by the peers.

How do we select the leader in a peer-to-peer network, without a central authority? And, at the same time address trust.

Proof of Work (PoW)

Scenario 11
Let us devise a puzzle, among all computer systems of the participating Acme Bank and Globelex Bank. Approximately within a 10 minute window, the office locations have to compute the block hash with a predetermined prefix of zeros. And, the block should include 3 pieces of information in addition to the proposed block of transactions, viz. the office identity (public key), the previous block hash from the chain, and a counter value used to find the solution to the puzzle.

The winner (whoever solves the puzzle first) gets two rewards. First, their ledger block is chosen as the next page of the ledger. And second, employees at that office location gets a small monetary bonus, which incentivizes them to solve the puzzle.

While finding the solution to the puzzle is hard, the process of validating the proof-of-work is simple. First, there must be a predetermined prefix of zeros in the hash generated by the winner. Second, any office location, getting the proposed block from the winner, can use the contents of the block (transactions, previous block hash, identity of winner, and counter) to compute the block hash easily. If it does not match the one sent across in the block, we know the block can be rejected.

Advantages of PoW

Let us summarize the unique properties of the proof-of-work proposal for building a global shared blockchain ledger.

1. Decentralized Peer-to-peer

The scheme does not have a single central authority or trusted entity. The proposed blocks in the ledger are created by the nodes in the peer-to-peer network without any central coordination. And, there is no predetermined or fixed membership for nodes in the network. Nodes can leave and join the network at will, and use the global blockchain ledger to catch up.

2. Leader Election

The puzzle is of the same complexity to all nodes, so there is fairness in leader election. The puzzle is unique per node (because it includes the identity of the office), hence introduces randomness in selection. Finally, the prefix of zeros is chosen such that every 10 minutes some node is likely to win, so there is no deadlock.

3. Immutable Block

The block hash ensures that no transactions in the block can be manipulated or changed after the block has been proposed. Changing a completed transaction in the finalized block would require re-computing the proof-of-work for that block, which will take resources and time.

4. Immutable Blockchain Ledger

As each block hash includes the hash of the previous block, manipulating a transaction in the past, would require an adversary to do the proof-of-work for ALL blocks in the chain beyond the time, the given block with the transaction was created. If most of the offices in Acme Bank are honest, and not compromised, it will be computationally impossible for the adversary to recreate the longest chain and have all the honest nodes accept the new chain.

5. Incentivized network

The nodes in the network are incentivized to participate in the game, and propose blocks, on account of the bonus reward that the winner receives for each block.

Challenges with PoW

The proof-of-work proposal has a bunch of challenges. These challenges in my opinion will hinder the widespread acceptance of blockchains as a viable platform for global decentralized applications.

1. Computation Cost

The solving of the puzzle of finding the block hash with a predefined prefix of zeros is computationally hard. It requires CPU power, and the incentive of the proposed bonus must be larger than the cost of the computation required to win. Moore’s law dictates that it will be cheaper and easier to compute the puzzle over time. So a PoW based system will need to evolve and adjust the difficulty of the puzzle over time, thus increasing the loss of opportunity cost of computation.

2. Wasted Work

Consider a network of 1000 office nodes. Given a sufficiently hard puzzle, it is highly likely that only one of the 1000 will win every 10 mins. And since the leader election is uniformly random, it is likely that each node will win once every week (1000*10/(60*24)). So the whole network is wasting 999 weeks of computation in order to support a global blockchain ledger. As per research [5], bitcoin is at annualized rate of 75Twh (that is Tera Watt Hour) energy consumption in 2018.

3. Forks

The leader/winner is supposed to broadcast the winning block to all the nodes in the network (in the actual bitcoin system it goes via multiple hops.) This propagation may take some time in the network. Imagine sending a fax to thousands of office locations. During this propagation time, it is possible that other nodes can find a similar solution to the puzzle, and assuming they are the winner start broadcasting their winning blocks. This could lead to forks in the blockchain. Over time a majority of the nodes would accept one (the longest) branch in the blockchain, thus invalidating the fork over time. Reducing the 10min challenge window, could lead to more winners and more forks potentially.

A side effect of forks is that, transactions that are on the losing branch of the fork, could be rolled back at a later time, when the branch loses on the chain. These transactions if not double spent would be re-submitted to the network. Events such as network partitions could exasperate this effect and lead to widespread invalidation of transactions.

4. Transaction “Finalization” Delay

Writing the next “permanent” block into the blockchain can take up to 10 mins in the network. So this means transactions cannot be truly committed until that time. Even after that time, there may be forks in the chain that may need to be resolved. Today, bitcoin is configured to wait for 6 blocks (one hour) to be sure that transactions are final. However, there could be extended conditions, like network partitions, that could last for more than one hour. So a transaction in today’s implementation of bitcoin’s blockchain is never definitely final.

5. No Fixed Resolution of Double Spends

If the winning block contains transactions that invalidate local transaction due to double spend, then the local node will need to rollback that local transaction. Which transaction wins in the double spend conflict is totally random, due to the random nature of leader election and picking the permanent block in the blockchain.

6. Poor Scale

Numerous studies have been published [8] on the possible scale and limitations seen in systems including Bitcoin, Ethereum, and other private blockchain implementations. And, the studies see scale in the few thousand of transactions per second. However, these studies are still nascent, and there is rapid improvements being proposed in the field. In practice, on the bitcoin network, today we see about 3 transactions per second on average [7]. And, the Ethereum network does approximately 7 transactions per second on average [6] with peaks of about 10 tps.

Conclusion

To conclude this blog, we have used examples to motivate blockchains. We have shown how blockchains enable decentralized peer-to-peer networks achieve consensus where trust is limited, to build an immutable database of transactions. We also see that blockchains, in its original form proposed by Satoshi Nakamoto, have numerous challenges. These challenges are opportunities for the next wave of research in the area of distributed computing, cryptography and databases.

In the next blog we will explore Dfinity, which in my opinion is one of the more elegant proposals for solving the challenges of proof-of-work while holding some of the key properties.

References

[link] Bitcoin: A Peer-to-Peer Electronic Cash System, by Satoshi Nakamoto, October 31, 2008
[link] Two-phase Commit Protocol in Wikipedia
[link] The Part-Time Parliament, by Leslie Lamport
[link] The Byzantine Generals Problem, by Leslie Lamport, Robert Shostak, and Marshall Pease
[link] Bitcoin’s insane energy consumption, explained. Dec 2017
[link] EtherScan dashboard
[link] BlockChain dashboard
[link] BLOCKBENCH: A Framework for Analyzing Private Blockchains, Tien Tuan Anh Dinh et al, March 2017