Regarding Bitcoin and Ethereum dev and client decentralization, I think those four graphs miss two…
Vitalik Buterin

Sure, agreed with all that. I do still think that the general idea of enumerating subsystems and using something like the Nakamoto coefficient starts to firm up our intuitions about decentralization.

As an analogy, it’s a bit like the benchmarks game site:

Each subsystem we pick is like a different benchmark. Any individual benchmark is flawed, but the collection of benchmarks helps us determine where a given language tends to be fast and slow.

By way of a more detailed reply:

Many of the so-called “alternative Bitcoin clients” are in fact forks of the same codebase as Bitcoin Core, whereas all Ethereum implementations have fully separate codebases created from scratch.

Agreed, and that’s a good possible refinement of that subsystem metric (“truly independent codebases”). If we use that definition, then Ethereum’s client decentralization is better than Bitcoin’s, because the truly independent codebases like btcd and bcoin don’t have as much share among Bitcoin nodes as Parity. Nevertheless, both systems would still be fairly centralized by this measure.

Ethereum doesn’t really have the concept of a “reference client”. If you take the literal meaning of “client that people refer to to improve their understanding of protocol rules”, then in many cases that’s actually pyethereum because python is easier to read. The C++ client is the client that generates the test suites. So counting commits to Geth imo understates the decentralization of the ecosystem.

Sure, it was basically chosen for an approximate apples-to-apples comparison in the sense of “developer commit distribution over Ethereum’s most popular production client vs Bitcoin’s most popular production client”. You could use an alternate definition, like distribution over commits over all independent codebases used in production. Or one could argue that commit count doesn’t matter.

Nevertheless, like the benchmarks game, the discussion at least starts firming up specific and quantitative measures of what it means for something to be decentralized.

In the case of mining and wealth, the problem is that there is such a long tail of very slightly interested amateur contributors that the Gini index likely ends up measuring artefacts of the cutoff of where one particular source of data started counting users more than anything else.

We actually did think of that — and you’re right that is an issue if we did the calculation across every ETH or BTC address, as the Gini coefficient is then very close to 1.0 (because the vast majority of addresses have 0 BTC/ETH, as do the vast majority of the world’s inhabitants).

In this case for the wealth/addresses calculation we limited it to the top N addresses for ETH and BTC, so that one has a measure of “how centralized is wealth among the top N”. We don’t argue that this is a critical metric, just an illustrative one. While you wouldn’t want a Gini coefficient of 1.0 for BTC or ETH (as then only one person would have all of the digital currency, and no one would have an incentive to help boost the network), in practice it appears that a very high level of wealth centralization is still compatible with the operation of a decentralized protocol.

For the mining/block reward calculation there’s a natural limit in terms of the time window. So we didn’t go deep into the tail here.

So looking at the Nakamoto coefficient, or similar measurements like the share of the top100, is definitely superior.

Yep, the Nakamoto coefficient also is useful in terms of having an intuitive interpretation (“minimum number of entities required to compromise the system”) whereas the Gini coefficient doesn’t give you something quite so concrete.

Like what you read? Give Balaji S. Srinivasan a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.