Over 12,000 Ether Are Lost Forever Due to Typos

Johannes Pfeffer
ConsenSys Media
Published in
6 min readMar 9, 2018

Alethio Analytics Series: Searching for the places where ether went to die.

Large and small cases of ether lost to typos for different block ranges (0–1M, 1–2M, 3–4M, 4–5M). Click for interactive version.

For a long time, I’ve wondered about the amount of ether that has been lost to typos. It happens whenever someone sends ether to a nonexistent address. Maybe you type “d1” instead of “1d” — and the ether is gone.

Never type Ethereum addresses!

When Nick Johnson challenged the Ethereum community to guess the amount of ether lost to typos up to block 5 million, I finally decided to give it a go.

Heuristics for Lost Ether

We’ll never know the exact amount, but we can get pretty close to a lower boundary. To understand how this works, let me go over a few basic Ethereum account facts.

There are two major types of accounts in Ethereum: External Accounts and Contract Accounts. External Accounts are those accounts that are secured with a private/public key pair + a password. If you own ether and don’t leave it to the custody of a centralized exchange, you control an External Account. That means you can send transactions on the account’s behalf (i.e. Alice can send 1 ETH from her External Account to Bob’s External Account). Contract Accounts are accounts that have associated code. They are also called Smart Contracts. External Accounts can call their functions and the contract reacts to them according to its internal rules. Both types of accounts have an address, a 40 character identifier that looks like this:

0x2910543Af39abA0Cd09dBb2D50200b3E800A63D2

For External Accounts, the address is derived from the public key. For Contract Accounts, it is derived from the address of its creator + some additional data (the nonce). Most importantly: you don’t get to choose or create your account address. If you could, users would just pick an account with a high ether balance, create a public/private key pair for it, and control the funds.

Because the address you create is quasi-random, there cannot be two similar addresses with an existing private key. There are 16⁴⁰ different Ethereum addresses and 40×(16–1)=600 addresses that differ by one character from a given address A. Given perfect entropy and uniform distribution of KECCAK-256, the probability of creating a new account B for a given address A that differs by only one character is 600 out of 16⁴⁰
or 1 out of 2.4×10⁴⁵
or 1 out of 2,435,836,062,218,171,530,339,474,721,193,805,032,759,887,571.

That is so extremely unlikely that we can say: If such a pair of addresses exists in the Ethereum state, it is reasonable to conclude that one of them had funds sent to it in error.

Let’s look at the following example. Both of the addresses in the box below have an ether balance > 0. The nonce of 17 for Address A means that so far, it has sent 17 transactions. The only difference between them is the first character: one has a “5” the other a “4”. To measure this difference, we use the concept of edit distance, which was originally formulated by Vladimir Levenshtein in 1965. Edit distance describes the amount of edits you have to do to get from one string to the other — in the example the addresses have an edit distance of 1.

Address                                     Nonce   View Tx
A: 0x580992b51e3925e23280efb93d3047c82f17e038 17 Link
B: 0x480992b51e3925e23280efb93d3047c82f17e038 0 Link

So, what do we know?

  • Address A has been used to send funds. Thus, someone has control over it and therefore someone knows its private key.
  • It is not possible to choose an address. Thus, nobody can have control over an address that differs only 1 character from another one that has a public/private key pair.
  • If we find such a pair of addresses, we can conclude that the one without outgoing transactions must be an address where ether was sent in error.

Results

More than 2,600 typos were made

At least 12,622 ether are lost forever

It’s true — over 12,000 ether were lost to typos until block 5 million. At a valuation of $700 each, that’s $8.84M total.

A close analysis offers a small glimpse into the story behind these blunders. I imagine some of these losses were more painful than others. Approximately 8 ether were lost because someone mined to a wrong address (this is where they should have gone). About a year and a half ago, another user lost 2,400 ether attempting to pay out from an exchange. At the time, that was $27,000. Now, that amounts to over $2M.

Click image for interactive version

You can download the raw data here.

Methodology: Considerations and Limitations

Definitions:

  • Typo candidates: accounts that have a balance > 0, never sent a Tx (nonce = 0), have no code, and don’t start with “0x00000000000000000000000000000000000000” (built-ins and commonly used as burn addresses).
  • Comparator accounts: real accounts. They are either contracts (have a code hash) or have sent at least one Tx (nonce > 0).

Three methods were used to identify the actual typos:

  • Naive (Hamming distance) O(n²) [1]
  • BK-Trees [2, 3] O(n)
  • A fuzzy model [4]

The naive approach works well until block 4M. After that, the number of addresses in Ethereum grows too high and an O(n²) approach becomes infeasible (for block height 4M, the optimized naive approach runs 12 hours on an 8-core machine). BK-Trees, especially fixed query BK-Trees (thanks Nick!), are great for exact results at edit distance 1. Fuzzy search was used to get some more results at edit distance 2. However, this type of model does not provide conclusive results (it will miss some matches).

It is possible (but unlikely) that someone has sent ether to Account B (the one with the typo) on purpose, for example to “burn” it––to make it forever inaccessible. Also, if we find a pair of Addresses A/B that only differ by a few characters, and that have both received ether but haven’t sent any transactions, we cannot know which of the two is the typo, and which is the real address.

Takeaways

It should be clear by now: never manually type out account addresses. The chance of misspelling and sending to the wrong address is just too high. Like any system with human actors, errors have been made and ether has been lost over the history of the Ethereum blockchain. If you don’t want this to happen to you, there are a couple of best practices you can follow:

  • Always copy and paste account addresses
  • Check the identicon of the target address before sending
  • Use the checksum version of the address
  • Use ENS (Ethereum Name Service) addresses

I hope this case study serves as both a cautionary tale for users and a call-to-action for anyone who cares about the health of the larger Ethereum ecosystem. Wallet providers, exchanges, developers, even UI designers––we have an opportunity to create verification measures, checksums, and warning systems that prevent these errors going forward. Ether is our fuel, after all. We’d be wise to conserve it.

Visit aleth.io to learn more about our analytics platform, and follow us at @AlethioEthstats for updates on our progress. Stay tuned for the next installment of our Alethio Analytics series!

Thanks to Nick Johnson for the inspiration and support and his awesome implementation of fixed query BK-Trees.

Data Disclaimer: Although the data in this analysis was carefully collected, there may be errors, inaccuracies, and misinterpretations.

Disclaimer: The views expressed by the author above do not necessarily represent the views of ConsenSys AG. ConsenSys is a decentralized community with ConsenSys Media being a platform for members to freely express their diverse ideas and perspectives. To learn more about ConsenSys and Ethereum, please visit our website.

--

--