This is the second post in our series, “Getting APIs on the Blockchain”. See our first post here where I define “API” via some computer science history.
As discussed in our previous blog post: APIs permeate our digital world and allow developers to build applications at a rate and at a complexity never before seen. In recent years, businesses are increasingly using APIs to monetize their data and services through completely API-centric business models. However, existing APIs are not natively compatible with blockchains and the decentralized applications that operate on them.
Just as Web 2.0 was marked by interoperability, user-generated content, and participatory culture, Web 3.0 is defined by decentralization. Practically, this means the distribution of computation and consensus across a network. To enforce these network-wide consensus rules, nodes in the network must verify global network states by computing proposed state changes — in the form of transactions — locally.
Thus, smart contracts running on a blockchain network can only operate on information that is accessible to and agreed upon by all nodes in the network. Thus, the blockchain is walled-off from off-chain information. This has been widely referred to as the “oracle problem”, referring to an ideal, abstract entity that can deliver Truth about the outside world to the blockchain.
The “Oracle Problem”
The “Oracle Problem” is a three body problem: data source, oracle, and on-chain data consumer. Existing solutions succumb to the pitfall of modelling their ecosystem as being solely composed of oracles and data consumers, while ignoring where the data originates from. In other words, their models inaccurately treat the oracle node as the mythical oracle that is the source of the truth, rather than what it really is — something that transports data from source to blockchain.
More essentially: the oracle problem is ill-posed — its name suggests an impossible solution. An analogy would be to approach the problem of getting from Point A to Point B as the “teleportation problem”. Further, ideal architectures for solving the oracle problem drastically change depending on the data type at hand (e.g. objective vs subjective information). Such a problem inevitably leads to impractical and/or sub-optimal solutions.
For a more formal treatise on these issues, see Sections 3 in the API3 whitepaper.
The problem with price feeds: a quick example
To illustrate some of my point, here’s an example. Price feeds are presently the most common use case for oracle networks, given the recent and rapid growth of DeFi. Under an architecture like that shown above — where oracles are not incentivized or enforced to report their sources — we can quite easily outline several problems:
- A price feed fed by 10 oracles (for example) does not represent 10 unique data points. All oracles could very well be serving data from the same API provider (and we’d be none the wiser). There is a lack of transparency here — the number of oracles serving a data feed does not correspond to higher quality and more robust data, although providers of such feeds might imply such things.
- Oracles have an incentive to gather cheap and easily accessible data because nothing is enforcing or incentivizing them to do otherwise (since, again, they are not enforced in any way to report their sources). This creates something of a Schelling point around cheap and easily available data. Further, this makes staking difficult, if not impossible, in such systems because now high-quality, curated data sources become outliers. (Issues regarding staking in such systems will be covered in more detail in a later post in this series.)
- Doing source-blind aggregation shows, what can only be called, statistical illiteracy. Like already mentioned, a data feed being served by x oracles does not necessarily correspond to x unique data sources. This is especially true when the number of oracles increases, because unique data sources are far less abundant and scalable than an oracle node. This means a data-source agnostic aggregation method results in a skewed aggregate result (since it is very unlikely that the oracle to data source ratio is the same for all data sources). A (likely small) subset of data sources has a disproportionate affect on the final aggregate result. And, again, for game theoretic reasons: this results in the aggregate result being skewed towards cheaper and easily accessible data sources.
- Let’s narrow down our example to price feeds again. Another problem with source-agnostic aggregation is the inability to do a properly weighted and normalized aggregation. Consider, a price feed contract (served by data from price aggregator APIs): oracle responses occur at different times, prices represent different trading volumes, and these prices come from different aggregators (certainly with their own proprietary aggregation methods). Blindly computing a mean or median on these data points is doing an “apples-to-oranges” comparison. That is, you are essentially computing a statistic on different data types but implicitly treating them as if they were the same — something that is clearly ill-informed to an average data scientist.
- And I haven’t even gotten to the legal repercussions of data source agnosticism. Most API terms of service prohibit the resale or unauthorized distribution of the API data, which positions an oracle node operator serving such APIs to be in breach of those terms and susceptible to broad sources of legal liability including claims by the API provider.¹
The API Connectivity Problem
We reduce the problem of getting objective² data on the blockchain to a two-body problem by redefining it as the API Connectivity Problem. This is the cutting of the Gordian Knot described here. (Note: this also solves the issues above regarding data source agnosticism since the data source is now represented on-chain.)
There are real-world businesses that create real value via the internet, but they can’t create real value on the blockchain because they are not connected to it. Indeed, the primary use of oracle solutions today is to deliver asset prices curated by API providers to DeFi applications. Emerging use cases such as prediction markets and parametric insurance have similar requirements.
Conclusions & next blog post
The API Connectivity Problem formalizes and specifies the problem of how to connect off-chain businesses —monetized and represented digitally by their APIs — with the blockchain (in a decentralized, cost-efficient, and secure way, of course). Connecting such APIs to the blockchain directly brings off-chain value on-chain.
How exactly do we bring API providers onto the blockchain? How does the transmission of such monetizable data and services differ from existing “oracle network” solutions? Keep an eye out for next week’s installment of this series, where I discuss the pros and cons of third-party oracles versus first-party oracles!
-  Practical Law, “Data licensing: Taking into account data ownership and use.” https://legal.thomsonreuters.com/en/insights/articles/data-licensing- taking-into-account-data-ownership.
-  I must note that there are suitable approaches to getting subjective data on the blockchain via posing a question and crowdsourcing its answer. A good example would be the resolution of a judicial dispute.