Bitcoin white paper explained (PART 1/3)
a technical overview
Doesn’t 2008 seem like a distant memory? The noise around blockchain might easily trick us into either deliberately bucking the trend or jumping blindly into investing (time or money) into a world where you would have to pick from among many different options. How could you make an educated decision at all if you didn’t understand all of its characteristics? You would be taking other people’s statements for granted and be easily tricked one way or the other.
I‘m starting with a detailed explanation of the Bitcoin white paper as the first of its kind. You will better understand further development if we go by chronological order. In PART 1 we cover until 4. Proof of work of the original paper.
Satoshi Nakamoto’s identity is unknown. He published the paper in 2008 and delivered the first client in 2009 through Sourceforge. Satoshi handed development to Gavin Andresen in 2010 and at the time of writing bitcoin is maintained by 500+ contributors.
Physical money allows for money transfers without an external party. Doing that digitally requires a mediator (i.e. a bank) which has implications:
- Minimum transactions cost rise as banks can’t avoid mediating disputes
- Merchants need information about customers to build trust as transactions are reversible
There is a need for a digital payments system where trust is ensured by incentives, probability and computation so that no bank interferes and also allowing transaction costs to be cut, while making it impractical to reverse a transaction. Buyers can be protected by routine escrow mechanisms.
In the following sections we will see how that can be achieved.
Reminder: having a centralized mediator means that it decides on what transactions go through, freezing money or how safe their system is.
The system is safe while honest nodes control more CPU than any cooperating group of attacking nodes. (See Part 2/4 for more on this).
The first thing is that owning some bitcoins isn’t like having a dollar in the pocket / bank account. Balances are computed based on transactions which are chained to each other. If you send money to your brother and your neighbor sends to his sister, both transactions will be part of the same chain. How much you own is defined by the transactions that are sending you coins and you didn’t use. A digital wallet just aggregates those numbers to show a balance for you.
You need to get a couple of things before understanding how transactions work.
Cryptographic hash function: you can picture a hash function as a blackbox that takes a string as input (such as “Hello Bob”) and returns a fixed size arbitrary string (such as “98b0f4b363af4aceb81bc42fd81117e1”). For a hash function to be usable cryptographically it must have certain properties:
- Same input always returns same output
- It’s quick to compute
- You can’t reverse engineer the “98b0f4b363af4aceb81bc42fd81117e1” that comes from “Hello Bob” without brute force (trial and error)
- A small change in the input will change the output a lot
- It’s unfeasible that two inputs generate the same output
Asymmetric cryptography: it will allow us to communicate through an insecure channel. A typical use case is when Bob wants to send a message to Alice which only she can read. Another use case is to be able to verify who the sender of a certain message is. This is the case that we are interested in (Did Bob really send that message?).
A pair of a private and a public keys are generated for every entity (Bob, a computer, a bitcoin address, …) with which they can sign documents the same way you can sign a car rental contract physically. Bob can use his private key to sign (generate a signature for) a document containing “Best car contract ever” and anyone can use the public key of Bob to verify that he (owner of the private key) actually signed that document. There is no way someone else could have generated that signature without having the private key of Bob which leads to him being the author. Check here if you want to get a feeling on how the ECDSA keys/signature look.
We already mentioned Bob and an address, but how are the two tied with each other?
Bitcoin addresses: The key here is that a bitcoin address is generated by several hash functions on a ECDSA public key. As anyone can generate their unique public/private key pair they are also enabled to generate unique bitcoin addresses.
Now that we have an idea of hashes, asymmetric cryptography and know that Bob can have multiple bitcoin addresses let’s see how everything works together to form transactions within the blockchain (transactions are grouped into blocks, hence the wording).
A transaction is a transfer of Bitcoin value from one or more inputs to one or more outputs. Let’s first explain the graphic taken from the paper and afterwards get a glance of how inputs and outputs look.
This represents a snapshot on any two transactions (simplified to 1 input / 1 output) within the chain. The content of the box inside a Tx (transaction) represents how a Tx is signed:
Let’s suppose we are Owner 1 (generating the transaction on the right). We use the public key of the person we are sending bitcoin to (“Owner 2’s Public Key”) and the previous Tx (line coming from the left Tx) to produce a hash which we are signing (through “Owner 1’s Private Key”).
We generate a hash because the bigger the thing we are signing through ECDSA the slower it gets and we want a fixed size string to be quicker.
We have generated a Tx but how do we know we are entitled to spend the bitcoin pointing to us from the previous Tx (doesn’t appear in the graph)? Well, that’s why “Owner 0” included our Public key in his signature just as we did with Owner 2. He has signed the declaration that we are entitled to spend a certain amount and we can verify it through ECDSA given the original Tx, its’ signature and the “Owner 0 Public key”. The “Verify” line in this graph initially confused me (“Why am I verifying what I just signed?”) but it’s there only to demonstrate that only by looking at the chain after last transaction was generated can we be sure of its’ ownership. That is, “Owner 1’s Signature” is a valid one and has been verified from “Owner 1’s Public key” and proves the chain of ownership.
So if we are sending bitcoins through public keys why do we need the bitcoin addresses, you might think. Adding addresses on top of a public key gives us an extra layer of security: as we already saw an address is a hash of the public key, so if for whatever reason ECDSA was compromised (someone can reverse engineer the private key from the public key and pretend to be you) and we didn’t spend our bitcoins yet our money would still be safe because people would just see the address the money was supposed to go to, not the actual public key. Only when we spend the bitcoin and generate a transaction with our public key we would get compromised. That’s also why it’s a common best practice (wallets do that for us) to use a different address/public-private key for each separate transaction.
Let’s deconstruct a real transaction to see what it looks like. To be able to do this, I’ll go to blockexplorer and I’ll pick a transaction (you can also see its content in JSON format) from the last block at this moment. I’ll continue with the JSON format but the same info shows up on the browser (find a field by field explanation below):
"asm":"OP_DUP OP_HASH160 1d49a050b1e965f59301f304bc9914378044364e OP_EQUALVERIFY OP_CHECKSIG",
"asm":"OP_DUP OP_HASH160 f362e8796d04713dda1796b3c609d4b7cd325187 OP_EQUALVERIFY OP_CHECKSIG",
First we find a “txid” which is the transaction id. It is a transaction hash (double sha-256) based on the content of the transaction itself. Then we find two collections: “vin” and “vout”. These are the mentioned inputs and outputs. Remember in the inputs we specify the output of which older transactions we are consuming bitcoins from.
Each input from “vin” has the following:
- A “txid”, which is the hash of the previous transaction that sent bitcoins to us.
- A “vout”, which denotes which output we pick from that transaction
- A “scriptSig”, which is the transaction signature (explained above)
- A “addr”, which is to whom the bitcoins are being sent to (for convenience)
- A “value”, which is how many bitcoins are being sent
Each output from “vout” has a:
- “value” which is the amount we are sending
- “n” is the number of output this is (which further transactions will use as we do in “vin” now)
- “scriptPubKey” contains to whom we are sending (“address”) and the script to verify who can consume the amount (“asm”) along with the script type (“pubkeyhash”). This video helped me understand better what’s going on in this Forth-like script.
At last we have “valueIn”, “valueOut” and “fees” which is a summary of the bitcoin input and output values and what reward goes to the machine processing the transaction.
In conclusion, we have 0.0468234 + 0.0016471 = 0.0484705 incoming bitcoins (from addresses 1HKqcNrf3NPuz4s2MdoAzpYYfjYvbbsxZf + 1CMzyZjERPYvecNcn2GDxpHCLqPCwAst3c) from which we send 0.0384705 to 13froCnxWczJNiJXYLQQikWygvyMFXqVUJ and 0.00775226 to 1PBugsUY1N3TvikrtDpQBBYuMBFoQWTHXi. We are sending a total amount of 0.04622276 which is less than the original amount which leaves 0.00224774 as a fee.
As already explained, who owns each address is defined by the owner of the private key whose public key generates that address. In this specific case a possible scenario is that person A (who owns the two input addresses) sends 0.0384705 to person B (who owns the first output). The second output belongs to person A as well as it’s the change he is getting back from the operation. If the transaction didn’t include the second output it would have been gone as a fee.
Someone could add two transactions that consume from the same output unless we have a single chain which someone checks. If this chain is centralized we would have to trust that entity that no double spending would be produced and the chain won’t be altered which is no different to the current situation with normal banks.
We consider the first transaction as the valid one, but without a centralized entity every transaction must be publicly announced and we need the participants to agree on when each transaction arrived at the time they have to generate a new one.
3. Timestamp server
A timestamp determines when an event occurred by using a sequence of characters. In UNIX it’s common to use the seconds since January 1st 1970 UTC time as a timestamp, which makes 04/15/2018 @ 11:56am (UTC) look like 1523793381.
If we introduce this value in the hash generation of the transactions/blocks we make sure the data existed at that moment. There is no way to generate that hash with the same data for a different timestamp. That way no one can pretend something happened in a different order maliciously. And as each hash depends on the previous one we enforce this relation by each new transaction.
To do this in a decentralized manner we have to introduce Proof of Work, but I believe that this article is long enough for the time being. Please check Part 2/4
Clap if this helped and follow me on Twitter @sgerov!