How we built the best Ethereum node infrastructure service on 🌎 .

And what we really mean when we say ‘best’.

Greg Lang

Published in

Rivet Magazine

8 min readNov 9, 2020

In the beginning, there was Infura.

When we started building the open source technology that would ultimately become the beating heart of Rivet, we didn’t have plans to offer it as a service. We were building it for OpenRelay, our 0x order book infrastructure project.

Back then, developers had two choices for Ethereum infrastructure — Infura or DIY. And while the truly OG teams frequently took it upon themselves to build and host their own nodes, the vast majority of request traffic was going through Infura.

While we were bound and determined determined to go the OG route and host our own nodes, we quickly learned a thing that all teams attempting to host a high-capacity, high-availability service capable of catering to a large number of simultaneous users learns:

Ethereum node clusters are brittle, complex, compromise-laden, soul-consuming, time-sucking, anxiety-inducing single points of catastrophic failure that require constant babysitting.

Pictured: Nodes behind a load balancer.

The uninitiated might think “what’s the big deal? You should treat it like any server — just spin up another instance and kill the old one. What a bunch of noobs!” But if you’re a seasoned Ethereum developer, you’ll know that Ethereum nodes have to sync with the network before they can respond to requests — a process that means saving the entire history of the Ethereum blockchain to disk. And you’ll know it often takes half a day or more to do so — presuming you’re building from a relatively recent backup. If you’re starting from scratch, it can take days.

Even in the best-case scenario you’re in for a devastatingly-long outage if all of your nodes happen to fail at once.

Population you. Photo by Will Myers on Unsplash

And that’s just the start of it.

If you’re running multiple nodes behind a load balancer, there is almost always going to be a chance that your nodes will not all be synchronized with one another since new blocks propagate across the network unevenly in a peer-to-peer network like Ethereum.

That means your query results can conflict internally! For example, imagine you run a cluster of two nodes behind a load balancer.

Your dapp executes a function that updates token balances at the latest block and displays it in a UI. First, it queries for the latest block number. Then based on the result, it queries for a token balance at that specific block number.

Now imagine the load balancer sends the first query to Node 1, which is at that moment one block ahead of Node 2. It returns the result and the function fires off a request to get the token balance at that block.

This time, however, the load balancer routes the request to Node 2, which hasn’t yet caught up to Node 1 — and worse than one might even expect, it doesn’t return an appropriate error response—just an outdated balance.

Now, this won’t always happen, but it will always have some probability of happening — and it’ll be unpredictable because the frequency that probability is realized will depend in part on how much traffic you’re getting.

While you might mitigate this problem in several different ways, your mitigation will never be 100% effective, and all of them will make the user experience inconsistent and — for lack of a better word — a little (or a lot) janky.

Topping it all off, if you get a sudden boost in traffic, you’re gonna need a lot of idle capacity since you can’t spin up new nodes very quickly. And if you don’t have enough idle capacity, your nodes will be overrun — degrading your service and potentially causing them to crash.

And those are just the broad strokes of the challenge.

Nightmare, right? You can see why many developers preferred to make it Infura’s problem.

The trouble with Infura.

Given all that can go wrong, you might wonder why we didn’t just set up a free endpoint with Infura and call it a day.

Well, there were three main reasons.

1. Infura and Goliath

I probably don’t have to explain to you that Web3 is all about correcting the mistakes of Web2. That’s why we all got into this to begin with — to build a cure for the moral hazard now reified by Big Tech, inherent to centralized control of vast amounts of user data.

If everyone used Infura, history would just be repeating itself, and we didn’t want to contribute to the trend.

2. Infura and Classic Jaguars

Pictured: expensive-AF taillights that take forever to ship. Photo by Markus Spiske on Unsplash

If you’ve ever tried to get replacement parts for a 1969 E-type Roadster, you’ll know that the parts take a long time to ship from the UK, that they’re expensive AF, and some parts have to be purchased from a salvage yard because the OEM has discontinued them. Waiting 3–5 weeks for a procurement agent to source and ship you a $100 taillight bulb is no fun — and is also a great example of why most people don’t make classic Jaguars their daily drivers.

When you’re trying to build an open source project, (OpenRelay, as with all our products, is entirely open source), you don’t want to build dependencies on things that may one day be a lot like that classic Jaguar’s taillight bulbs— proprietary and not guaranteed to be affordable or readily available down the road.

3. Infura and the Power of Control of the Vertical

We wanted to build OpenRelay to take full advantage of the capabilities of Geth, and we wanted to optimize it to be as efficient as possible without worrying that something might change that would undermine or break our optimizations.

When you build dependencies on proprietary code managed by third parties, you’re locked in to their decisions — and the inherent downstream limitations those decisions impose — for as long as the dependency exists.

So all things considered, Infura wasn’t really an option we could live with.

Enter the EtherCattle Initiative: our evil master plan to build a no-compromises easy-to-manage open source Ethereum node cluster architecture.

Rather than bite the bullet and resign ourselves to mitigating all of the issues with the ‘nodes-behind-a-load-balancer’ approach, we came up with something different — and the EtherCattle Initiative was born.

Thanks in part to grant funding from the 0x Project, we built a system that enabled us to spin up additional node capacity in minutes. The solution — streaming replication — wasn’t especially exotic or even new. It had just never been implemented in an Ethereum client before.

While it took us a number of months and a lot of instrumental innovations, once we got it together, it worked like a charm. With it, we could kill unhealthy instances and spin up new ones with no trouble at all — and add or reduce capacity based on current usage metrics rather than attempted tea-leaf reading based on anticipated capacity.

The only problem? OpenRelay didn’t need nearly the kind of capacity afforded by the minimum viable size of a high-availability cluster. What would we do with all the extra capacity?

Rivet is born.

The idea for Rivet — a competitor to Infura that could 1) reduce dependency on a single centralized provider, 2) give projects an open source alternative to Infura and others that were then emerging (such as Alchemy), 3) leverage the remarkable capabilities of the EtherCattle Initiative technology to deliver a service that was qualitatively just plain better, ultimately contributing to the ascendency of Ethereum-based projects and the advent of Web3, 4) give developers and their supporting teams a tool that would lower barriers to entry into Ethereum development by make their lives less complicated, and 5) help fund the continued development of the open source EtherCattle Initiative.

How we made Rivet the best on 🌎 (and what we mean by ‘best’)

The result in real terms is this—we took the underlying technology and paired it with a service that:

1. Has predictable, transparent pricing. Nobody likes getting surprised by (or explaining) bigger-than-expected bills — a problem that emerges from complex billing models based on compute credits or t-shirt sizes and overages. In fact, nobody really likes talking about it at all — it’s just not that interesting or rewarding to consider the nuances of a byzantine grid of prices.

People building the future have better things to worry about. So our pricing strategy was intentionally designed to be as simple and straightforward as possible.

$1 = 100k requests. It’d be hard to invent something more simple.

2. Self-service, minimal data capture. We didn’t want to pepper developers with a ton of webforms or require sales calls and consultations before developers could just get started using the service. We also would prefer not spend a lot of time doing that kind of stuff. So Rivet is self-service and fast to get started. All you need is email address or an Ethereum wallet and you’re off to the races — no meetings required.

3. Minimalism in design and function. Twiddly bits are sometimes a fun distraction, but the fact remains — they’re a distraction. That’s why you won’t find a whole lot of gizmos, fancy analytics tools, weird metrics, or finicky options in the Rivet dashboard. Because ultimately the most important thing about Rivet is that it does what its supposed to quietly, and otherwise stays out of your way.

Fun Tidbit: We did consider putting an easter egg in the dashboard at one point — we held back. Who wants to look up at the clock in horror and realize they’ve been playing Galaga in their infrastructure provider’s dashboard all afternoon, right?🤪

4. Friendly, collegial expert support by people who know our software inside and out. When you reach out for help, we’re prompt and ready to help you with whatever you might need.

If you reach out more than once, we’ll remember who you are the second time. And best of all — we don’t assume by default you’re doing something wrong. We work with you to work out whatever comes up.

5. Security, reliability, privacy, availability, and performance you can’t get anywhere else. There are a untold nuances glossed over in this short history of how Rivet came to be, and we know it is without equal because we refused to compromise. We weren’t rushing. We weren’t trying to save on short-term costs to get it on the market fast. We built it for ourselves—for our own project. And then we designed the Rivet service around it to be the one that would’ve stopped us from doing so had it existed at the time.

The best part? We’re still just getting warmed up.

Photo by Louis Hansel @shotsoflouis on Unsplash

Watch this space — you ain’t seen nothin’ yet.

— ❤️Rivet