How 30+ ETH 2.0 Devs Locked Themselves in to Achieve Interoperability

When the going gets tough, the tough get coding. Here’s the latest on the state of Ethereum 2.0.

Ben Edgington
ConsenSys Media
9 min readSep 25, 2019

--

(Most of) the Interop crew. [Photo: Danny Ryan]

by Ben Edgington of PegaSys, as part of the State of Ethereum 2.0 Series.

It was just a throwaway remark during a team stand-up: “We need to get in a room, lock the door, and not let anyone out until all the clients are talking to each other.”

As far as I knew, nobody took it seriously. So when Joe Delong announced an interoperability lock-in retreat a few weeks later at the Ethereum 2.0 meet-up in Brooklyn, it was a bit of a surprise. He’d found a venue, secured funding from ConsenSys, and set a date—all without saying a word about it. The Ethereum 2.0 Interoperability Lock-in was happening!

This is how 30+ Ethereum 2.0 developers ended up in the lakes of Ontario for a week in early September. So what did we achieve while we were there?

The location was stunning [Photo: Ben Edgington]

The Lead-up to the Lock-in

One of the glorious things about the Ethereum 2.0 development process has been the number of teams building software clients to run the beacon chain. No fewer than nine teams have committed to working on this over the last year, each implementing the evolving spec, and in turn feeding back into the design process.

For the Ethereum 2.0 network to function properly, all of the participating clients need to communicate with each other and come to the same conclusions about the state of the network. Until now, each team has been working more-or-less in isolation, translating the specification into its own particular programming language and architecture. Some teams had already set up testnets with their own clients talking to their own clients, but this Interop event was the first significant opportunity to get cross-client communication up and running.

There are roughly three ways in which different client software needs to interoperate. The first is consensus-related: at every step, given the same information, every client must agree precisely on what the state of the system is: validator balances, committee shufflings, state roots — in short, each client must have a bug free implementation of the core specification for Phase 0. Writing bug free code is hard! One huge help with this was the massive and very detailed set of reference tests created by the Ethereum Foundation in the weeks leading up to the Interop Event.

(Most of) the Interop crew. [Photo: Danny Ryan]

The other interoperability challenges relate to the network. One is that beacon chain clients need to communicate with each other in order to “gossip” blocks and attestations (votes for blocks) across the network. The other challenge is that clients need to be able to join an already running network, “syncing up” with its current state by downloading the data they need from the existing participants.

For the purposes of the Interop workshop, we focused mainly on the first aspect, although some progress was also made on syncing. This networking aspect benefited enormously from the completion of a draft networking specification a couple of months ago, thanks to a ton of work from a couple of client teams and Protocol Labs. The specification is a detailed description of how clients connect to each other, the kinds of messages they send, the format of the data that’s passed over the wire, the encryption that’s used, and so on. Once again, for the purposes of basic interop testing, we simplified by side-stepping the question of how clients find each other (discovery).

In the weeks leading up to the event, the Ethereum Foundation team worked with Whiteblock to survey all the client teams in order to judge their readiness for the event. The results were somewhat encouraging, but not entirely. There remained clients that were not passing tests, and teams with inconsistent approaches to starting up the network. A couple of teams weren’t even able to sustain a stable network made up only of their own clients.

Morning stand-up was a little early for some [Photo: Ben Edgington]

Cooperation for Interoperation

On Friday 6th September at the beautiful Skeleton Lake in Ontario, this group of Ethereum 2.0 developers assembled like the Avengers, but with more computer hardware. Far from any distractions, and far from any broadband Internet — nothing to break the focus beyond an occasional poker game or kayak trip.

Altogether, seven client teams made it to the event: Artemis (PegaSys/ConsenSys), Harmony (Mikhail’s visa arrived just in time!), Lighthouse (Sigma Prime), Lodestar (Chainsafe), Nimbus (Status), Prysm (Prysmatic Labs), and Trinity (Ethereum Foundation). Unfortunately, Wei Tang of Shasper (Parity) was unable to make it, but that didn’t stop him from working remotely to achieve interoperability with Lighthouse! Both of the Yeeth teammates were actually present, but each working with one of the other client teams: Dean Eigenmann with Nimbus and Eric Tu with Lodestar.

In addition, Danny Ryan from the Ethereum Foundation acted as master of ceremonies, along with Diederik Loerakker (aka protolambda), and a team from Whiteblock. Alongside the Interop event, we also invited some of the people working on Ethereum 2.0 Phase 2 (the execution layer): the Quilt team from ConsenSys and the Ethereum Foundation Ewasm team. It was a great opportunity to sync up (no pun intended) with each other’s work.

A breakout meeting to discuss syncing protocols [Photo: Ben Edgington]

Each morning began with a daily stand-up: a progress report from each team. Then it was off to the races. Ahead of time, Danny had made a detailed plan of attack. But in the event, teams pretty much just dived in to try to build networks together, and this worked out OK.

The first couple of days were mostly spent learning how to build and run each others’ clients, debugging the consensus rules, and filling in gaps in infrastructure and network protocol implementations. With respect to the debugging, protolambda created a brilliant tool, zcli, which takes a state and some blocks as input and then generates the output state. This proved enormously helpful for figuring out who was wrong when two clients started to disagree.

The Lighthouse team was in high demand. Their client was recognized to be among the most stable and spec-conforming, so it made sense for teams to try to interoperate with Lighthouse first.

Pretty soon, the results started coming, and teams rushed to Twitter to announce their successes. On September 9th, at 2:49 AM EST, Lighthouse and Nimbus claimed a first. Not long after, Artemis and Lighthouse were running together. The measure of a healthy beacon chain network is that it is finalising epochs; you can see from the screenshots that it’s happening!

Other pairwise interop announcements began to come thick and fast: Artemis with Nimbus, Lodestar with Lighthouse, Prysm with Lighthouse, Artemis with Trinity, Trinity and Lodestar.

Pretty soon, it was time to try pushing the boundaries. The next achievement was realized on September 11th with a four-way interop: the ArtHouseBusStar network. A day later, five-way: ArtPryHouseBusIty. And later the same day, the crowning achievement of the event: 7-way interoperability; a network running one client from each of the seven teams present.

Jonny Rhea demos 7-way Interoperability for the first time [Photo: Rene Lubov]

What’s next?

Progress during the Interop week exceeded anyone’s expectations. A seven-way network was way beyond the initial goals. Without question, we can chalk it up as a huge success.

Yet, there remains much to do. Some of the work done to get interoperable quickly was a bit “string and sealing wax”, and needs to be made more robust. The networks we built were operational, but fragile. Progress was made on syncing and discovery, but there’s more to do. These were amazing first steps, but there’s a fair way to go before we are running marathons.

We need to work on scaling up to handle thousands of validators, and performance tuning the clients.

Teams will be increasingly collaborating over the next stage. We are going to build joint public networks, and within a few weeks, perhaps a long-lived testnet with bounties available for anyone who can break it.

The target remains to go live in Q1 2020 with a fully operational beacon chain network, secured by 2 million Ether worth of stake. Nothing we found during this event puts that at risk; everything points to progress being well on track.

A Nimbus node and a Lighthouse node running on two Raspberry Pi boards. The phone is the console; the other kit is batteries. [Photo: Ben Edgington]

Reflections

I think we all agree that the timing of the event was perfect. It really hit the sweet-spot of being ready, but not over-prepared. The value of having everybody in one place at one time was immense. Doing this remotely would have been agonising. Just learning to build and configure each others’ clients was a big step forward for interoperability.

The venue was wonderful. All being together — in an isolated, beautiful cabin, with superb catering laid on and no distractions — promoted a strong sense of purpose and focus. The event enabled teams and individuals to build relationships and friendships that will help us greatly as we work together over the next months to deliver this thing.

Do we really need so many beacon chain client teams? It’s a question that is often asked. To be honest, I don’t know. Nonetheless, getting so many implementations to interoperate is a thorough test of the specification and of each other. Coordination costs have turned out to be relatively low. And having so many first-rate engineers fully focused on Eth2 definitely ups everyone’s game.

My own lasting impression from the event is the sheer level of drive and energy of all the teams. Everyone pulled together in an extraordinary way; I have never worked in such a committed, positive, and simply fun to work in environment. Everyone, both onsite and offsite, put in huge and heroic efforts to make this project work.

Jim Jagielski recently reminded me of Antoine de Saint-Exupéry’s quote, “If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.” This is the spirit of Ethereum 2.0 development.

It wasn’t all work-work-work… Mikhail Kalinin rescues Jonny Rhea’s phone from drowning after a boating incident. [Photo: Ben Edgington]

Huge thanks to Joe Lubin and ConsenSys for making this event possible, and special thanks to Joe Delong for taking the initiative and putting in the hours and stress to make it happen. See the Ethereum Foundation’s Review for a slightly more technical perspective.

Disclaimer. The views, information, and opinions expressed are solely those by the author above do not necessarily represent the views of Consensys AG. They are meant for informational purposes only, are not intended to serve as a recommendation or investment advice to buy or sell any securities, cryptoassets, or other financial products.

Any reference in this article to any person, organization, activity, products, or services does not constitute an endorsement or recommendation of ConsenSys. This article does not constitute legal or other professional advice or services. ConsenSys is a decentralized community with ConsenSys Media being a platform for members to freely express their diverse ideas and perspectives. To learn more about ConsenSys and Ethereum, please visit our website.

--

--

Ben Edgington
ConsenSys Media

Blockchain protocol engineering at PegaSys, ConsenSys.