Using Mnemonics to create Wallets

Vault0x
Vault0x
Published in
5 min readNov 22, 2017

Blockchain has come through many iterations to support some of the most secure features in the world. All these enhancements for the Blockchain are defined under Blockchain Improvement Proposal(BIP). Most of the BIPs are minor changes, but some BIPs like BIP-0032 which introduced Heuristic Deterministic wallets became a significant addition to the Blockchain and helped the community go stronger and secure. BIP-0032 majorly discussed how we can secure our key pair by generating child keys out of them and letting the parent key be secure at some secret place but still there was a major issue that the master key was being backed up almost 100 times to avoid the loss of the keys. To mitigate this flaw, BIP-0039 was proposed.

BIP-0039 introduced to us the concept of the implementation of a mnemonic code or mnemonic sentence which in a nutshell is a group of easy to remember words for the generation of deterministic wallets. It was divided into two parts: generating the mnemonic, and converting it into a binary seed. Later this seed can be used to generate deterministic wallets using BIP-0032 or similar methods.

You may have heard about the seed. Encryption is powered by random numbers, but how do you generate a truly random number? The current millisecond? The number of processor threads in use? You need to get a starting point and that’s where seed kicks in which initiate a random number.

Why introduce mnemonic code? A mnemonic code or sentence is superior for human interaction compared to the handling of raw binary or hexadecimal representations of a wallet seed. The sentence could be written on paper or spoken over the telephone. A general thumb rule is that it’s easy to remember English words than the hex code.

Generating the Mnemonics

The mnemonic must encode entropy in a multiple of 32 bits. With more entropy security is improved but the sentence length increases. Let’s call the initial entropy length as ENT. The allowed size of ENT is 128–256 bits. For example, let’s take ENT as 128 bits. A checksum is generated by taking the first

ENT / 32

bits of its SHA256 hash. In our case that turns out to be:

128 / 32 = 4

This checksum is appended to the end of the initial entropy. For example, entropy is now:

128+4 = 132 bits

Next, these concatenated bits are split into groups of 11 bits, each encoding a number from 0–2047. In our example, the number of groups turned out to be 12. Now you may wonder, why the range of 0–2047 is chosen? As we are dealing with bits which can be used to represent numbers for example 0011 is the representation for numerical 3, so 11 bits can form a number representation from 0–2047. These numbers from 0–2047 serve as an index into a wordlist. Finally, we convert these numbers into words and use the joined words as a mnemonic sentence.

Characteristics of Word List

An ideal wordlist has the following characteristics:

  1. Smart selection of words: The wordlist is created in such a way that it’s enough to type the first four letters to unambiguously identify the word.
  2. Similar words are avoided: Word pairs like “build” and “built”, “woman” and “women”, or “quick” and “quickly” not only make remembering the sentence difficult but are also more error-prone and more difficult to guess.
  3. Sorted Wordlists: The word list is sorted which allows for more efficient lookup of the codewords (i.e. implementations can use binary search instead of linear search). This also allows trie (a prefix tree) to be used, e.g. for better compression.
  4. The word list can contain native characters, but they must be encoded in UTF-8.

You can check a sample English Wordlist here: A sample wordlist

Converting Mnemonic to Seed

Converting mnemonic to seed is done by custom algorithms. A user may decide to protect their mnemonic with a passphrase. If a passphrase is not present, an empty string “” is used instead.

To create a binary seed from the mnemonic, we use the PBKDF2 function (This is a key derivation function universally used by major libraries) with a mnemonic sentence used as the password and the string “mnemonic” + passphrase used as the salt.
The input words are checked against the word list. If the word is found in the list then the index corresponding to that word is used. All the indexes are stored in a list or an array. HMAC-SHA512 is used as the pseudo-random function and the array of indexes is fed to the PBKDF2 function to get the seed for the key generation. This seed then can be used to generate the Deterministic wallet.

Pseudocode to convert Mnemonics to Seed

i = 0
while i < lengthof(words):
# convert each word from mnemonic into an int in the range of [0, 2047]# check the position of word in the sorted word list
seed_int[i] = lookup_word(seed_words[i])
i = i + 1
num_words = number of words reserved
num_words2 = num_of_words * num_of_words
#Seed string in hex format
seed_str = “”
i = 0
while i < length(seed_words):
# hex8 converts an int into an ASCII string of exactly 8 zero-padded lowercase hex digits;
# % is the modulus operator
seed_str = seed_str + hex8( seed_ints[i]
+ num_words * ((seed_ints[i + 1] — seed_ints[i ]) % num_words )
+ num_words2 * ((seed_ints[i + 2] — seed_ints[i + 1]) % num_words ))
i = i + 3
unstretched_seed = ascii_string_to_byte_array(seed_hex_str)
seed = byte_array() # an empty byte array
i = 0
while i < 100000:
# sha256 operates on and produces byte arrays
seed = sha256(seed + unstretched_seed)
i = i + 1
master_private_key = byte_array_to_int(seed, order=big_endian)

Using this concept we don’t need to create 100s of backups for the key, we can directly use the mnemonics words to get back the keys. The described method also provides plausible deniability, because every passphrase generates a valid seed (and thus a deterministic wallet) but only the correct one will make the desired wallet available.

--

--