Powering DeFi: (In)efficient Sources of Truth

Ugur Mersinlioglu
5 min readFeb 17, 2022

--

In my previous two articles I laid out how intransparently oracle networks like Chainlink operate and also showed that despite labelling themselves as ‘decentralized’, the vast majority is merely serving centralized data feeds in a decentralized manner. With this article I want to dive deeper into how (in)efficient most oracles operate. I’ve been picking heavily on the OXT/USD price feed of Band protocol in my last articles, which is why I want to switch it up and take a look at the price feeds of a large cap asset like Solana.

SOL/USD Price feed from https://data.bandprotocol.com/symbol/SOL

Compared to the OXT/USD feed, we can instantly see that more than one source is being used to for Bands SOL/USD feed. Obviously, this is a large improvement, since the feed is now more resilient against a single source misreporting, however we’re still left with somewhat of a dilemma. We’re looking at 16 third-party nodes serving data from much fewer data sources (in this case 3). Why are we doing something like this in the first place?

One of my colleagues Ryan wrote a pretty neat little section about this in one of his articles:

While third-party oracles have solved the problem of connecting APIs to blockchains, they’ve introduced a new problem. In addition to trusting the API provider, we also have to trust the third party oracle in the middle. The typical way this new trust problem has been addressed is to follow the lead of the blockchains themselves and decentralize oracle nodes.

As stated before, the benefit of decentralization is that it reduces the risk of an individual lying. But the thing that’s decentralized when we’re talking about third-party oracle networks is the third-party oracle nodes, not necessarily the API providers. In fact, it’s quite normal for third-party oracle nodes to acquire their data from the same APIs. In some cases third-party oracle networks even “decentralize” data from only a single source and then “aggregate” it back together.

In these situations the only problem being solved by decentralization is the one created by third-party oracles.

This essentially means that because we are not able to trust a single third-party node we are forced to use an oracle network, even if that means that over a dozen nodes only report on a single (or very few) data source(s).

Pratically, this means that consumers are forced to pay for additional nodes, without necessarily getting more decentralized data in return. In fact, if you use the SOL/USD price feed as an example and would let Coingecko, Coinmarketcap and Binance run first-party oracles and use them directly to build the price feed, you would end up with a feed of 3 nodes that is just as decentralized as the construct of 16 third-party nodes that Band created. This means that consumers are overpaying by exactly 13 nodes (over 4x more) without receiving any additional benefits.

Chainlink faces similar issues by the sheer nature of their (mostly) third-party architecture, but I simply cannot confirm this easily. You see, if you haven’t gotten the message from the previous article, nobody can really tell how many and which data sources Chainlink node operators use for the numbers they come up with. Very conveniently, this makes it quite tricky to analyze how (source level) decentralized or efficient their products are.

What we’re left with is a guessing game of how many sources are used in a Chainlink price feed, which isn’t impossible — just annoying…

A quick glance over the SOL/USD feed of Chainlink on Ethereum Mainnet allows us to guesstimate that 10 data sources are being used by 16 nodes. (Because there is no way that different sources will actually report the same value to the exact decimal, which means that an exact answer match means the same source is being used)

SOL/USD on ETH from https://data.chain.link/ethereum/mainnet/crypto-usd/sol-usd

We can play the same guessing game on the exact same price pair (SOL/USD) on Binance Smart Chain and then we’re left with an even less efficient feed.

SOL/USD on BSC from https://data.chain.link/bsc/mainnet/crypto-usd/sol-usd

16 nodes serving data from 7 data sources, which means we’re overpaying by a factor of over 2x…

The SOL/USD price feed on BSC also reveals another concerning thing. 6 nodes get their data from source #2 and 4 nodes from source #3. Most of this construct (10 nodes or 62.5%) relies on only 2 data sources, which effectively means that if these 2 sources misreport, this entire feed will misreport and the fact that it has over 16 oracle nodes “securing” it, won’t change a thing. Quite shocking to figure this out while only looking for inefficiencies in price feeds, but hey Chainlink allowed me to show you how “decentralized” and “efficient” they are in one example, even without revealing any of their sources.

While some investigative work is always fun and helped show that Chainlinks price feeds (just like Bands) are (by nature of a third-party design) insanely inefficient (even considering things like OCR or the Cosmos SDK), I’m left with a bigger question:

Why doesn’t Chainlink make something as crucial as this public (unlike Band *clap*), so everybody (but most importantly, paying customers) knows what it is that is being consumed here? Why do I have to compare oracle responses to guess how many data sources might be used in one of their price feeds?

I thought this space is all about transparency?

--

--