The Web3 Shift to First-Party Oracles

Published in

API3

7 min readFeb 1, 2022

What is an oracle?

Smart contracts, which run on blockchain, need off-chain data but they can’t call Web APIs like regular web applications. With smart contracts, many nodes in a distributed network do the same computation on the same input data at the same time. They must agree on the results in order to move forward. If the nodes don’t all have the exact same input data then it doesn’t work. “Time” on the blockchain can’t move forward. If every node attempts to call the same API at the same time, nothing guarantees they’ll get the same response. One node would have no reason to trust the API responses used by other nodes as inputs.

Because of 1) this distributed nature of smart contracts, 2) the need for all nodes to share the same inputs, and 3) the need for the network to reach consensus on the results, smart contracts are isolated from the internet. Smart contracts can’t get data directly from the real world.

Something has to put the real-world data needed by smart contracts on-chain so all the nodes can use it as input in their computations. This “thing” that decides the correct value of real-world data and puts it on-chain is called an Oracle. Doing this correctly without sacrificing the benefits of using blockchain in the first place is a complex and difficult problem. It’s referred to as “the oracle problem”.

A brief history of the oracle problem

The oracle problem is an interesting one in blockchain. But the idea didn’t originate in the blockchain space. Long before the Bitcoin white paper was released, the concept of an oracle was used in software testing. The code that’s being tested is executed with known inputs and something verifies that it produced the expected result. The “thing” that decides whether the test has passed or failed is referred to as the oracle.

The oracle problem in software testing is also hard. Especially when testing distributed systems. And even more so when testing an online system in real time. Blockchains are online distributed systems, which helps explain why the blockchain oracle problem is so difficult.

The earliest reference I can find to the oracle problem in blockchain is a Reddit post from December 13th, 2014. The user described the problem of decentralized applications running on the Bitcoin network that depend on external data. To quote the user who wrote the post, “I think of it as ‘The Oracle Problem’.” Then they continue to discuss how hard this problem is and that it seems like any solution is susceptible to being gamed. And the gauntlet had been laid.

An early example of a blockchain oracle was the Augur prediction market which was manual — humans were crowdsourced to answer questions and the winning answers were used on-chain. Later came programmatic oracles such as Oraclize that pulled data automatically and objectively from the web instead of manually and subjectively from humans but they were more centralized. Then came decentralized programmatic oracles such as Chainlink where multiple oracle nodes run at the same time, get data from Web APIs and aggregate it into a single feed. Today, the term oracle usually implies an API-driven one such as Chainlink.

As in software testing, much progress has been made in the blockchain oracle space. But it has not been solved. Oracle technology has evolved but there is no general solution. Engineering trade offs have been made that gain in one area by sacrificing in another, to better fit different use cases and situations.

Where are we now?

Theoretically, oracle data can come from a wide variety of sources but practically, it almost always comes from Web APIs. In an effort to avoid having smart contracts (decentralized applications) rely on individual API providers, the most common approach to the oracle problem today is to aggregate data from multiple oracle nodes. But the oracle nodes still get their data from Web APIs.

Today, decentralized oracles are usually operated separately from the APIs providers that own the data. Since these oracle nodes are operated by third-parties instead of the API providers themselves, we call them third-party oracles. Third-party oracles introduce additional problems.

It’s difficult to tell which APIs the data came from because there is an opaque layer of third-party nodes in the middle. Third-party oracles are not transparent.
The goal was to decentralize the data but that only happens if the data itself comes from multiple, independent APIs. Since developers can’t see which APIs the data came from it’s difficult for them to know whether the data is actually decentralized. They can only see that the third-party oracle nodes are decentralized. In practice those nodes often get their data from the same APIs so the data itself isn’t as decentralized as we’re led to believe. For some data there is only one or a few reputable API sources to get it from in the first place.
Since smart contract developers can’t see where the data came from, it’s difficult for them to judge its quality. In an attempt to avoid trusting API providers, we’re instead trusting third-party oracle networks to source high quality data and aggregate it for us accurately and honestly. Rather than solving the problem we’ve just kicked the can down the road. And third-party oracle node operators are motivated to maximize profit by sourcing low cost data, which is often lower quality.
A network of third-party oracle nodes comes at a cost. Each node has operational costs and needs to make a profit. If developers can’t see and verify that the data is coming from multiple, independent, high-quality API sources then why should they believe they’re getting any benefit in return for the cost of decentralized oracle nodes?

The Gordian knot

Burak Benligiray, a founder of API3, wrote a great article on Medium, “The Gordian Knot called The Oracle Problem” in which he compares the oracle problem to the legend of Alexander the Great where King Midas had tied a knot that seemed to be impossible to untangle. Rather than untie it, Alexander cut it in half with his sword.

Burak’s metaphor captures a trend that’s playing out in the oracle space now. Solving the oracle problem with decentralized third-party oracles is like trying to untie the Gordian knot. Rather than continuing to struggle with an impossible problem, Alexander used a pragmatic approach and reduced the problem to something more simple and effective. Likewise, the industry is realizing that the bottleneck to decentralizing off-chain data is in finding multiple, independent, high-quality API providers. It’s not the oracle nodes.

The pragmatic approach to off-chain data is to simplify “the oracle problem” down to “the API connectivity problem”. Rather than trusting opaque networks of third-party oracle nodes, smart contracts need transparency and direct access to the APIs that provide the data. Transparency is the key to minimizing trust when bringing data on-chain. Data decentralization should happen trustlessly on-chain, not opaquely off-chain.

The solution is for the API providers themselves to be the oracles. We call these first-party oracles. When the API provider who owns the data runs the oracle themselves, it’s a first-party oracle. When a third-party who doesn’t own the data runs the oracle, it’s a third-party oracle.

The trend toward first-party oracles

Intuitively, the advantages of first party oracles over third-party oracles are clear. All other things being equal, getting data directly from a reputable, high-quality source is preferable to getting it from third-party middlemen, even if there are several of those middlemen.

Leading oracle projects are acknowledging this. Chainlink is promoting oracle nodes being run by reputable API providers such as AccuWeather. API3, Pyth, Flux, IOTA, and other oracles tout their first-party architecture as an improvement. The next generation of oracles are first-party.

This trend isn’t just a sudden realization that first-party oracles are better than third-party oracles. Something else has changed. Actually two things have changed that enable first-party oracles to finally become reality.

Blockchain has gone mainstream. A few years ago, most reputable traditional businesses were barely thinking about blockchain and smart contracts. Now many of them are jumping in head first. They used to fear blockchain and now they want in.
Oracle nodes used to be more difficult to run. Now it’s easy for an API business to run their own oracle node with little to no cost or maintenance and they don’t even have to deal with cryptocurrency in order to monetize it.

These two changes in the blockchain landscape are monumental. Now API providers are both motivated and able to become oracles themselves and deliver their data directly on-chain.

What are the consequences of this? It means the API providers who own the data can offer it directly and transparently to smart contracts. Middlemen are no longer necessary. When you need decentralized data feeds aggregated from multiple sources they can be aggregated securely and transparently on the trustless blockchain rather off-chain in an opaque network of third-party oracles.

What does the future look like?

Is there still room for third-party oracles? Of course. They’ll still exist and, in my opinion, they’ll continue to grow along with the entire oracle space. But as a result of these two key environmental changes they’re no longer the only feasible option. As on the traditional web, companies can decide whether they want to outsource hosting to third-party providers or do it themselves in their own cloud computing account such as AWS, GCP, and Azure.

But there is one important difference. In Web3, minimizing reliance on trust, being trustless, is considerably more important. Data source transparency and avoiding unnecessary middlemen is the most practical way to minimize trust when bringing data on-chain.

We’re seeing a massive uptick in both the depth and breadth of how smart contacts use off-chain data. Demand is increasing rapidly for existing, heavily used data feeds while use cases that require new kinds of off-chain data are appearing more and more frequently. For example, parametric insurance, escrow contracts, and logistics are newer use cases that depend on off-chain data.

The transparency, efficiency, and scalability of first-party oracles will play a massive role in bringing these new use cases to life.