Bitcoin and Ethereum seed phrases explained for non-technical people
Free of scary cryptography
Its redundant to say that the blockchain industry is changing at a breathtaking pace which means I tend to have a just-in-time (Justin time?) learning philosophy for most things technical. My main passion at the moment is Ethereum development and so I only know as much as I need to about the nitty-gritty of cryptography. Once I’m ready to publish something real on the main net, I’ll properly wrap my brain around cryptographic security. However, until then, some things have become so commonplace that they can’t be avoided or put off anymore. For those of you who use metamask and other HD wallets, you’ve probably become familiar with the seed phrase/seed mnemonic/pass phrase/ backup phrase structure of private key generation. In the early days, generating a new wallet meant writing down a long list of numbers and letters. Then almost suddenly, that all changed and the world moved over to using a phrase of 12 common English words that looks something like:
dove lumber quote board young robust kit invite plastic regular skull history
So where did private and public keys go and is it possible to get a broad understanding of why things are like this without descending into terrifying maths and computer science? The answer is yes. I’ve taken a brief pause from my furious buidling to quickly spit this article out because it seems unfair to have something so broadly used only be intellectually accessible to the likes of Alan Turin and Vitalik Buterin.
TL;DR This article is already a tl;dr of BIP39. Sorry.
Terms you should be familiar with: algorithm (see here for a lengthy example of what algorithms are), Ethereum, Bitcoin and public-private key cryptography.
Disclaimer: This isn’t complete and perfect. Just enough to give you an intuitive grasp. Also there are details I’m not 100% sure of and I say as much. Corrections are welcome but please submit them as private comments, otherwise they won’t make sense to future readers if they no long apply.
If you’re still hazy on what private-public key pairs are, use this analogy: a private key is your gmail password. A public key is your email address. Everyone can see your email address and they require it to send you mail. When you send mail, they will see it has been delivered from your email address. However, only you know your gmail password. This gives you power to send emails and read emails sent to you. Email addresses and passwords exist in pairs like this (email@example.com;123456) where each email address can only have one corresponding password and vice versa. Why refer to it as a pair? I’ve noticed that the more mathematically inclined people are, the more excited they get at ordering things into groups, sets, pairs, rings etc. I’m not sure why but it seems to make them happy. So whenever you hear a technical person use the jargon “public-private key pair”, think “email address and its corresponding password.”
Bitcoin has Bitcoin Improvement Proposals (BIP) which are submitted by anyone from the public for review before undergoing a lengthy acceptance or rejection process. In Ethereum, the equivalent is commonly known as an Ethereum Request for Comments (ERC). A well known example, ERC20, is the suggested standard for token exchange and is of course the 20th ERC. Most BIPs and ERCs fail to get anywhere but BIP39 did succeed and explains how the seed phrase mnemonic works. The Ethereum community has a history of borrowing good ideas from Bitcoin and as such the standard now applies in both.
So let’s dive into how BIP39 works:
You start with a dictionary of 2048 words. The first 4 letters of each word must be unique. That is, no other word in the dictionary can start with those same 4 letters. This allows you to store a seed phrase that looks like
sexy doge manbearpig house
sexy doge manb hous
The dictionary is ordered and indexed like
1 | Abacus
2 | Alabaster
3 | Albania
2048 | Zoology
By convention the world uses the same dictionary. This just makes compatibility between wallets easier to maintain but if you want to be all Satoshi about your security, you can make your own unique list and write wallets that cater to that list if you like.
The next step is to have a random number generator which spits out numbers between 1 and 2048. This is fairly trivial in computer science and the geniuses that be chose a reliably random and well known algorithm for the job.
So now you generate 12 numbers which correspond to your 12 words using the dictionary.
Prior to BIP39, BIP32 established a new standard of private-public key generation known as hierarchical deterministic (HD) wallets. The concept behind it is far less intimidating than the word choice. The magic of HD is that you generate a master private key which you can put through an algorithm that generates another unique private key. You can then use this second private key to generate yet more private keys. Each private key can be put through (yet another) algorithm to produce its corresponding public key. I’m not sure how many of these successive key-pairs you can generate from one master key but I think each master key can spawn about 2 billion.
Putting it all together
The mnemonic phrase you have written down can be backward converted to a list of numbers between 1 and 2048 using the dictionary. The words are simply there to make it easier for humans to understand and convey over the phone (for those who don’t believe the NSA exists). So our list of
dove lumber quote board young robust kit invite plastic regular skull historywill be converted to something like101 900 1781 62 568 123 88 1322 1099 1544 666 finally you just bunch that all together as one long string to form a unique string of numbers:10190017816256812388132210991544666
Using a special algorithm, you can convert that final string into a master private key used in the HD wallets. You can then spawn a huge new list of private-public key pairs as desired. You probably only need 1 but Satoshi suggested using a new key pair for every transaction. This standard makes that way easier than ever before. All you need is 12 words, your master key to (almost) endless key pairs.
Personally I use a single key phrase to generate 10 unique addresses which I load with fake ether and use to test my smart contracts locally but I imagine merchants and exchanges take full advantage of the flexibility and ease of use of this system of key storage and generation.