To Understand Blockchains, You Should Understand Cryptographic Hashes First

Bitcoin is first and foremost a child of cryptography

For me, the key to grasping blockchains is understanding cryptographic hashes. I think a lot of us normies make the mistake of conceiving of blockchains as basically lists of anonymous transactions distributed across decentralized networks. While this is indeed what a single blockchain is, what sometimes gets lost is that the foundation of blockchain technology lies not in decentralization, anonymity, or even distributed ledgers, but in cryptography itself. While this may be obvious to anyone with a basic understanding of computer science, for most of us it is entirely new.

Ultra Brief History of Digital Money

Bitcoin is a new take on previous experiments with digital money. In the 1990s, this was a hot, albeit speculative, topic. Even Alan Greenpsan said during a speech in 1996,

“We could envisage proposals in the near future for the issuers of electronic payment obligations, such as stored-value cards or ‘digital cash,’ to set up specialized issuing corporations with strong balance sheets and public credit ratings.”

Establishment use of digital currency was thus on the table well before Bitcoin. To liberate digital currency from the establishment, another wrinkle was needed. That wrinkle was cryptography.

At the same time as Greenspan’s speech, the Cypherpunks were already experimenting with digital currencies with the express intention of subverting banks. Their experiments, all of which existed before Bitcoin, included Adam Back’s hashcash, Nick Szabo’s bit gold, Wei Dai’s b-money, and Hal Finney’s RPOWs. All of these harnessed the power of cryptographic hash functions and together they compose the giant shoulders upon which Bitcoin stands today.

What Are Cryptographic Hash Functions?

A cryptographic hash function takes data and essentially translates it into a string of letters and numbers. Ever use a URL shortener like bitly or tinyURL? It’s sort of like that. You put in something long, and something short comes out that represents the longer thing. Except with cryptographic hash functions, the input doesn’t need to be long. It can be extremely short (e.g. the word “Dog”) or almost infinitely long (e.g. the entire text of A Tale of Two Cities), and you will still get a unique output string of uniform length. Also unlike URL shorteners, the hash functions involved in Bitcoin only work one-way. While the same data will always result in the same hash, it is impossible to reconstitute your original data from hash it produces.

So data goes into a hash function, the function runs, and a string of letters and numbers is produced (you can try it for yourself here). That string is called a hash. In the Bitcoin blockchain hashes are 256 bits, or 64 characters.

It may seem impossible that a near infinite amount of data can be translated consistently into a unique string of only 64 characters, but this is miraculously how cryptographic functions work. Entire books full of text can be translated into a single string of 64 numbers and letters using this incredible technology. And every time you put the same data in, you will get not only the same identical hash, but one that is unique and different from any other hash.

For example

The hash of:

“It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way — in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.” is:

df0a199c7fef0a53d9a4144bc9122441b94510c13faf424ca26b65aa5035048f

While the hash of “Dog” is:

cd6357efdd966de8c0cb2f876cc89ec74ce35f0968e11743987084bd42fb8944

How Cryptographic Hash Functions Work

There are different types of cryptographic hash functions and each works differently. The way the above hash function — SHA-256, the hash function utilized by Bitcoin — works is via an insanely complex formula involving the way light bounces off ellipses. Don’t worry too much about this. The bottom line is that cryptographic hash functions are motherfucking magic and you will never fully understand them unless you are a mathematician.

How Hash Functions are Used in Blockchains

In order for the blockchain to work, it has to update itself. Like a bank, it has to keep up-to-date records of all transactions and the assets (e.g. bitcoins) that each member of the network has. It is in updating transactional information that any authenticating system opens itself up to attack. A bank mitigates this risk by having a rigid centralized hierarchy that guarantees authenticity at its own peril. So how does the blockchain manage to update itself while staying decentralized? It uses a crytpographic hash probability game called “Proof of Work.”

Cryptography Ensures Consensus

In order to continue functioning, a blockchain must create new blocks. Since blockchains are decentralized systems, new blocks must be created not by a single authenticating entity, but by the network as a whole. In order to decide what the new block looks like, the network must arrive at a consensus. To achieve this consensus, miners propose certain blocks, the blocks are verified, and, ultimately, the network chooses a single block to serve as the next portion of the ledger. However, many, many miners propose identical blocks that qualify as verified. So how is a specific block chosen to be the next in the chain?

Computers compete in a hash game. It’s very simple. Basically, to win the game, a mining computer has to guess a number, called a “nonce,” which, in combination with all the previous data in the blockchain, outputs a certain hash when run through the SHA-256 hash function.

You know the hackers in the movies that use a program that guesses a million passwords a minute in order to hack a computer? It’s sort of like that. All the miners in the world simultaneously run a similar guessing program searching for the correct nonce which, when added to the blockchain data and run through the SHA-256 hash function, produces a random hash that the blockchain itself has “decided” is the “solution” to the problem. The computers are, in a way, working together, as each hash that’s already been guessed cannot be guessed again. The first computer to guess the correct number wins the right to create the next block, and is rewarded 12.5 bitcoins, currently worth about $50,000.

This ensures consensus and also prevents attackers from manipulating the system. Since every input results in a completely unique hash, only a certain nonce combined with the correct previously verified blockchain data will result in the hash that solves the equation. If inaccurate or fraudulent previous records are input, the correct hash cannot be guessed. In this way, cryptography makes blockchains more secure than any human-verifying bank could ever be.

However, as more and more people have dedicated more and more computer power to mining bitcoins over the years, a new problem has arisen. Innovators have designed high powered computers designed only to mine bitcoins. These computers can guess hashes many more times than an average computer, making them able to arrive at correct guesses much faster, and thus mine many more coins.

The problem with this is that, as fewer and fewer people can afford mining technology, the potential for centralization increases. Blocks are created and bitcoins are mined as soon as the solution to the next block is found. Thus, someone with resources could simply build a more powerful mining machine than anyone else, and they could mine a huge percentage of the remaining bitcoins faster than anyone else simply by guessing more guesses in shorter time.

The 10 Minute Solution

Satoshi baked in a mechanism to mitigate this problem, and it also relies on the power of hash functions. Here’s how I like to think about his solution:

Imagine that your hash function outputs animals instead of strings of numbers and letters. There is an equal chance that given a random data input, the hash function will output an elephant or a monkey. Taken at random, your data is no more likely to translate to one animal than it is any other.

However, now imagine that you put certain criteria on what sort of animal your hash function must translate your data into in order to work. This affects the probability that any given data will be hashed into an animal that satisfies your criteria. The data “abc123,” for example, is more likely to output any animal (in fact it’s 100% certain in this example) than it is output any biped, because there are many, many more potential guesses that qualify as any animal than they do any biped. It’s even less likely to output any monkey.

The way the Bitcoin blockchain works is that the guessing game self-adjusts to always be difficult enough to be guessed by all computers on the network only every 10 minutes, no matter how powerful the guessing computers are. This means that guessing a hash and mining a block today is theoretically much more difficult than it was when Bitcoin debuted in 2009, because there are so many hyper powerful computers on the network. Indeed, it is impossible to mine a bitcoin today with an ordinary laptop, whereas in the early days of Bitcoin everyone on the network was using a standard computer.

So how does the Bitcoin protocol ensure that the guessing game gets hard enough to require approximately 10 minutes of guessing time, even by extremely powerful mining computers? Remember the monkey example. The more specific criteria put on the output of a hash function, the more guesses will have to be made in order to receive that more specific output. Instead of “right” answer being an animal (easy), biped (hard), or monkey (harder), increasingly particular criteria are put on the “right” hash string that must be guessed in order to win the game. Specifically, the game gets harder by requiring the right hash to have a certain number of zeroes at the beginning.

Think about it this way. If I ask you to guess a random three digit number in order to receive a chocolate, you have a better probability of guessing right if the correct number is any three digit number than it is if the correct number is any three digit number that begins with a 0. This is difficult to wrap your mind around, but it is rooted in a mathematical law that says that there only needs to be square root of N random occurences for them to have a 50% chance of collision. It’s the same math that supports the Birthday Paradox — if you have only 23 random people in a room, there is a 50% chance two of them will have the same birthday.

Unfortunately, by making the number harder and harder to guess, the Bitcoin blockchain has inevitably evolved to exclude ordinary folks from mining. But this is a fair price to pay for decentralization. Without it, one wealthy company could theoretically create an extremely powerful computer and mine all remaining bitcoins instantly.

I should note that I only (barely) understand how the Bitcoin blockchain works. Other blockchains could use cryptography entirely differently than I have described here. For example, I don’t know if Proof of Stake, allegedly a more efficient improvement on Proof of Work, uses the same system.

However, as the model for all other blockchains, understanding the Bitcoin blockchain is the most important step in grasping the world of cryptocurrencies. While understanding many other aspects of blockchain is essential, conceptualizing how it uses cryptography is perhaps the most important piece of the puzzle.

NOTE: This article has been syndicated on my Urbit (click here). This is a lot more significant than it might appear. Here’s why:

On my Urbit, this article is published independently and appears via a fully decentralized peer to peer network. That means that when you’re reading it there, there is no central server (a la Facebook, Twitter, Medium), no centralized content management system (a la Wordpress), and no middlemen editors/managers/marketers (a la New York Times) involved. There is no third party providing or exploiting the connection between you and me, outside of Urbit’s distributed network of private cloud computers operated by ordinary single users. There is of course an ISP (e.g. Time Warner) involved and a web host (DigitalOcean in this case), but both of those parties are passive, and provide the connection without any ability to own, monitor, or exploit the content therein.

Think of publishing on an Urbit like publishing on a blockchain—we are connected by a passive, non-human, decentralized network instead of an interested, human, centralized third party. Unless the way the internet works changes drastically, no institution currently possesses ability to censor our connection.

Decentralized publishing platforms will hopefully soon begin implementing Bitcoin/cryptocurrency-based micropayment systems, meaning that you will be able to pay me something like 1 cent per article, cutting out middlemen altogether.