Security lessons from the ENS launch

(update: here is project lead Nick Johnson’s own post-mortem and next steps)

Two serious issues were uncovered shortly after the launch of the Ethereum Name Service (ENS) on March 14th, 2017.

As a community contributor to the project, and a student of smart contract security, this is a bittersweet opportunity to learn from what feels like a punch in the gut. This post is my effort at extracting useful lessons from these unfortunate events. Most of them are operational, and relating to human factors rather than the code itself.

It would be easy for spectators to label this as another “TheDAO”, and ask why nothing has been learned since last summer. But this is a very different situation; the outcome is vastly less catastrophic precisely because of the lessons of the TheDAO, and so we have a new set of lessons worth looking at.

I know that Nick will have his own retrospective soon (update: here it is) , and possibly others, so take this as just one perspective.

The bug(s)

Context: Each name registered with the service has an auction period (during which blind bids should be submitted) followed by a reveal period ( during which the contents of a bid should be revealed). 
 Bug #1: Nothing prevents a bid from being submitted during the reveal period, so anyone could see the winning bid, and then outbid it, or underbid it by only a small amount in order to force the winning bidder to pay a much higher price. Nick Johnson’s post does a good job of summarizing the issue, and you can take a look at the proposed fix.
 Issue 2: A bid value could be “faked”, meaning you could indicate a high bid value, without actually submitting that value. It breaks the bidding process completely, allowing anyone to win any name while only paying 0.01 ETH.

The silver lining (what was done well)

With two significant bugs less than 24 hours in, it looks pretty bad… it’s definitely not good. Fortunately, there were fail safes which will minimize the damage.

  1. Most importantly, the contract can be upgraded to an improved registrar by a committee of key holders, which will prevent the loss of any funds already held by the Registrar. One lesson of “TheDAO” (April-June 2016) was that some degree of central control should be maintained over smart contracts, especially in the early years of nascent Ethereum technology development.
  2. The bidding UI had not yet been activated, in order to perform further testing. This protected non-technical users and minimized the amount of money put in.

What was not done well

1. Test coverage was incomplete

Both of the issues uncovered could have been caught with tests to verify proper behaviour of the registrar. The behaviours which the issues enabled were not edge cases, but rather specifically prohibited. 
Unrelated, and inactive code was stored in the same repo as the code which was ultimately deployed. This meant that npm test would run 39 tests, but only 13 of those tests were on the Registrar contract, which has at least as many methods and many more requirements than that.

2. There was no “official” bug bounty

A good bug bounty has a significant marketing component, and real value up for grabs. Unfortunately, this is hard to do on a test net, where the tokens have no value, and stuff is expected not to work. 
WeiFund’s live net bug bounty, which held over 200,000 USD worth of real value in deployed contract, is a good counter-example. Blockchains present an interesting opportunity for a “natural bug bounty”, we as contract developers should be taking advantage of this.

3. There was no Formal Audit

The review period consisted of several months on Ropsten, and an informal review consisting mostly of a few foundation members, and a few community members. For the most part we were playing around with the code and building a dapp on it. 
 There is unfortunately no single document summarizing what review was done, and what was found.

4. Thrashing

It’s nearly impossible to audit an evolving code base. 
Many changes were made between the Ropsten launch, and the live net launch.

I actually looked at the commit that introduced these issues, but it’s a large contract, and I was busy, and the effort required to reacquaint myself with the full code base and understand the full implications of the change felt onerous.

5. Implications for Architecture

The problematic change introduced a state machine for tracking the status (enum Mode { Open, Auction, Owned, Forbidden, Reveal }) of a name. No issues with that state machine have yet been discovered, and I think the approach was sound. The bugs are in the logic which evaluates the validity of a bid, and set its properties. A similar state-machine-based separation of concerns would have made the logic more transparent. 
 Taking that a step further are useful comments in the Gitter chat from Peter Borah:

We also need to think harder about the upgrade/fault-tolerance story. The code is pretty brittle to bugs at the moment. You can’t change anything without completely redeploying the registry and having every user manually move their deed to the new one, and there’s a lot of complexity that bugs can hide in.
I’m going to be annoying and relink the blogpost I wrote around the time of the DAO, because it’s still relevant:

and Dan Finlay:

So there’s a second abstraction that would be helpful: Registrar storage vs registrar logic.

7. Incident response planning

In my opinion, the proper action for the ENS is probably an extended delay of a couple weeks to a couple months. This is painful because of the built up expectation, but bugs were always a possibility, so the need to delay should always be considered in advance.
A common approach is to have a play book, basically, “if X happens, we will do Y”. 
I think that the reaction so far has been suitably transparent and proactive, but would have been better organized if the decisions had been made beforehand.

My role in this

I met learned about the ENS project at Devcon 2 in Shanghai. It was immediately exciting to me, and I recognized Nick as a someone I could learn a great deal from. Taking guidance from the existing registrar code, I wrote the registrar specification as a means of enabling community discussion. Then I wrote the library which the Registrar app uses to communicate with the Registrar contract.

The commit that introduced these issues was actually made in response to another bug I had found. Despite that involvement, it couldn’t be said that I’ve “completed an audit” of the ENS Registrar. Although I’m sad about the outcome of the launch, I’m still proud of the project, and still expect it to succeed.

I am also determined to become a better developer and security practitioner through the experience.

Where to now?

This is an important moment for the Ethereum community to show what it’s made of. The reaction so far has been overwhelmingly understanding and supportive, and I hope that continues. This kind of even headed response will allow developers to learn and improve, and ultimately contribute to a much stronger ecosystem.

By the same token, these projects deserve the scrutiny of a well informed community. ConsenSys’s Smart Contract Best Practices is an excellent set of technical guidelines for reviewers. But non-technical users should equally know what artifacts they should see as evidence of secure code, such as a bug bounty with high rates of participation, and a publicly available code audit.

I still believe the ENS project is foundational and promising to the Ethereum ecosystem, with the potential help Ethereum cross the chasm, by making crypto as user friendly as email.

My hope is that we’ll see the project slow down to reset, and see the community rally to make this a truly robust, and secure contract system.