Analyzing Trezor Firmware: Mnemonic Seed Generation for Bitcoin and Ethereum

An example mnemonic seed phrase.
The first function called in generating the mnemonic phrase
random_buffer() function
#pragma message( \    "NOT SUITABLE FOR PRODUCTION USE! Replace random32() function with your own secure code.")// The following code is not supposed to be used in a production environment. It's included only to make the library testable.
// The message above tries to prevent any accidental use outside of the test environment.
//
// You are supposed to replace the random8() and random32() function with your own secure code. There is also a possibility to replace the random_buffer()function as it is defined as a weak symbol.
The default implementation of random32(), left to the user to change.
random_buffer() function
buf[i] = (r >> ((i % 4) * 8)) & 0xFF;
mnemonic_from_data() function, which takes random bytes and returns the mnemonic phrase.
// checksum
bits[len] = bits[0]
Diagram showing how the checksum is appended, from Andreas Antonopolous’s “Mastering Ethereum”
// data
memcopy(bits, data, len)
What the bits array might look like.
Considering 11 bits at a time from the byte array called bits
Checking whether to add 1 to idx.
The same equation from earlier
1 << 7 - ((i * 11 + j) % 8)
1 << (7 - ((0 * 11 + 5) % 8) 
= 1 << (7 - (5 % 8))
= 1 << 7 - (5)
= 1 << 2
= 0b00000100 // A different byte created from scratch
The same equation from earlier
0b00100100 // bits[0]
AND
0b00000100 // The byte created from scratch, with a "1" at bit "j"
=
0b00000100
Considering 11 bits at a time from the byte array called bits
Extracting the word from the wordlist

Conclusion

  • It is ill-advised to provide a default implementation of a cryptography library that is completely insecure. The git history shows that random32() at one time pulled from /dev/urandom, and another time used a predictable timestamp as the seed. Throwing an exception is the only correct option here, regardless of allowing users to “test” the library out of the box.
  • The code I showed to generate 11-bit indices is wildly inefficient. It does tens of operations to iterate bit-by-bit what could easily be done word-by-word in memory, masking 11 bits at a time. This is strangely juxtaposed with explicitly using a left-shift assignment operator, when any modern compiler would optimize a more readable multiplication by 2.

--

--

--

Building something new! Bitcoin, security, mining. Former early @Gemini.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

# Interfaces and Dependency Injection: What and why should they matter?

What is an algorithm? — absolutely the most straightforward definition!

How to Prepare for a Hackathon? A Survival Guide

One-click content export, staging and deployment strategy for Drupal 9 websites (with Version C

ASCII vs Unicode

Local development with Serverless

5 Software Developers to Subscribe on YouTube Part 1

🎶🌅 Sunrise Choir: January 🌅 🎶

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Brandon Arvanaghi

Brandon Arvanaghi

Building something new! Bitcoin, security, mining. Former early @Gemini.

More from Medium

Crypto for women: Ethereum, Ether and ETH explained

CENTRALIZED DECENTRALIZATION

STIMA decision-making process

Efinity: The Metaverse Is Live on Polkadot

Don’t be fooled by NFTs.