Setting up Chainlink on Ethereum, Part 3.: Aggregation of results and synthesis

Adam Z. Nagy
9 min readNov 28, 2019

--

You should be familiar with Chainlink basics now. If not, click here Part 1. and Part 2.

Let’s see a more complicated setup. For this part, everything is already in place so you don’t really have to deploy anything yourself. The extended system for aggregating results of multiple Chainlink nodes is the following:

The setup. N is the yet-undetermined number of Oracle nodes

The parts

Nodes and Oracle contracts

There are N Chainlink nodes instead of one. It is important to note that Chainlink nodes do not communicate/coordinate with each other off-chain, they are totally independent.

There are N Oracle contracts that each belong to their corresponding off-chain nodes. These are also independent from each other.

I talk about choosing an optimal N later.

The EWT/EUR Job

Each Node runs the following job:

{
"initiators":[
{
"type":"runlog",
"params":{
"address":"0x3b903c60ef60c4b29655608576cc16c0ddcd6da0"
}
}
],
"tasks":[
{
"type":"httpget",
"confirmations":0,
"params":{
"get":"https://api.liquid.com/products/560"
}
},
{
"type":"jsonparse",
"confirmations":null,
"params":{
"path":[
"last_traded_price"
]
}
},
{
"type":"multiply",
"confirmations":null,
"params":{
"times":10000
}
},
{
"type":"ethuint256",
"confirmations":null,
"params":{}
},
{
"type":"ethtx",
"confirmations":null,
"params":{}
}
],
"startAt":null,
"endAt":null
}

For this Job, no parametrization is done from the client contract to reduce on-chain message size. It is hardcoded to fetch the price from https://api.liquid.com/products/560, multiply it with 10000 and return the value in a transaction.

Contracts

All of them are included in this shared GitHub gist, just import it to Remix as before at Part 1.: 21db5a581949c4ab7882add2bb0d5867.

Aggregators

The Aggregator contract is the heart of this whole operation. Chainlink provides a reference implementaion (Aggregator.sol) that can be modified and extended. The Aggregator contract also derives from ChainlinkClient.sol. How they work in general:

  1. The Aggregator usually maintains a list of Oracles and JobIDs. New ones can be added or removed similarly to any other permissioning contract.
  2. Whenever a request is made to the Aggregator, it forwards that request to all the Oracles, separately.
  3. The Aggregator pays for the Oracles. The Aggregator contract must own enough Chainlink tokens to work.
  4. The Oracles run the invoked jobs independently and return the answer in the Aggregator’s fallback function
  5. The Aggregator waits until a certain number of results are collected for a query, and then calculates a statistical value, e.g. mean or median, which is the final answer.

The Aggregator provided by Chainlink is just an example. It is expected that you modify it to your needs (result data types and co).

For this poc I use ExecutorPublicPriceAggregator.sol, which I developed only for this price oracle use case:

  • It handles uint256 numbers instead of int256s.
  • It can execute an arbitrary transaction as soon as enough results are collected. Enough results mean 3-out-of-5 Oracles replies here.
  • Instead of the Aggregator paying, the requester has to approve the Aggregator to spend a certain amount of tokens in its name. If the requester doesn’t do it, the requests fail.
  • It is meant to be a public price oracle infrastructure poc.

Let’s try it out!

Using the price Oracle

Fetch the necessary contracts

In Remix just compile and add contracts on the following addresses. Copyable version here: https://gist.github.com/ngyam/6d1ffdc727eea2cfdc8636c53a12a677:

Approve some tokens for the PriceAggregator

This was a little bit tricky. I wanted my PriceAggregator contract to be a public infrastructure, so how payments are made to the Oracle contracts? Who pays? The Aggregator is the ChainLink client contract, submitting requests with the “transferAndCall” function of the LinkToken. How can I make sure, that the user pays and e.g. I don’t have to continuously fill up the balance of the Aggregator in some tricky way? Or use some registry of users and their tabs? In the end I decided to simply use the “approve” feature of ERC-20 token contracts.

First, you have to check your Volta-Link token balance which you can do with the LinkToken contract’s “balanceOf” function. If you still need some more you can go back to Part 1 section 3 to see how to get some.

There is a convenience function “calculateRequestFee” in the Price Aggregator that returns how much you have to pay for 1 request, which is now basically the price of 5 individual Oracle requests.

Then go back to the LinkToken contract. With the “approve” function, approve some tokens for the Aggregator contract address, at least 1 request worth. Each time you make a request, the Aggregator tries to pay for the Oracle nodes in your name. If you didn’t approve enough to pay for a request, the call will fail.

Now you should be ready to interact with the Aggregator.

Request a rate update from the Oracle

This is the part when we try things out. You can call 2 PriceAggregator functions:

  • requestRateUpdate: simply updates the “currentAnswer” storage variables and “updatedHeight” once enough results are in. Updated height is the blocknumber when the last change happened. The user can actually decide whether he wants to make a new request or the current answer is fresh enough.
  • requestRateUpdateWithTransaction: with this you can fire a transaction in the end. I added this because as an external user (meaning an external smart contract) usually wants to some immediate action once the oracle results are in. Loading in a transaction works the exact same way as in the Gnosis Multisig wallets. If you are not familiar with Gnosis Multisg wallets, check out my interactive notebook tutorial here (js) or here (py).

By calling e.g. “requestRateUpdate” you have to see same things happening as for the first part of the tutorial. With the difference that results are coming aggregated.

How results are aggregated

Everything happens in the smart contract on-chain:

  1. We wait till the required amount of answers are in, e.g. 3
  2. Sort answers, and choose the middle one from the list
  3. If the answer-list size is even, we use the mean value of the 2 middle elements.

You can implement of course your own strategy too depending on the use case. If you use Oracles for e.g. random numbers, this strategy would not make sense.

How to choose the number of nodes N and required number of answers k

In theory

Usually you want to have an odd number of nodes and at least 3, e.g. a TMR config.Why? Because of majority voting and error detection. If 1 node says A and the other 2 says B, you can detect who lies. If there were e.g. 4 nodes and 2 would say A and 2 would say B, then who is right?

  • 3 nodes can tolerate 1 failure (2-out-of-3), 5 nodes can tolerate 2 (3oo5), 4oo7, 5oo9 and so on.
  • 4 nodes tolerate 1 failure too (3oo4) but it is just more expensive then 2oo3.

In addition, you want some protection against 2 types of Oracle failures:

  • A node is down, inactive, so the Oracle never sends an answer.
  • A node lies, malicious, so it sends a bullshit answer or wants to mess with you on purpose.

In practice

Let’s map theory to my price Oracle use case.

If a node is malicious, there are 2 scenarios:

  1. The Oracle reports some too low or high price value. I protect against it by choosing the minimum number of required answers to an odd number that’s larger than 1. Let’s settle with 3. This way the outrageously low or high number gets to the very beginning- or end of the sorted list and it doesn’t get chosen as the answer.
  2. If the malicious answer is kinda on the same range as the other answers and gets chosen as median, it is not malicious at all. 😀

So if I’d choose e.g a 3oo5 config, I would protect against:

  • 1 malicious and 1 inactive node, or
  • 2 inactive nodes, or
  • 1 malicious node only

If there were 2 malicious nodes, potentially one of them could be chosen as median.

I could also go with a 3oo4 config which would protect against 1 inactive or malicious node.

Reflection, limitations and further improvements

This is just a simple poc to test Chainlink’s capabilities. Get inspired, take it one step further, but don’t take it too seriously as it is.

  • I feel like the ERC-20 token system is a limitation and annoyance that forced me to use this “approve” based scheme to pay for the Aggregator. Could Chainlink oracles work without the LinkToken? Absolutely. With native tokens life would be simpler, but the whole ecosystem is built around this token so you have to buy it. For now let’s just accept that open source projects have to make a living somehow and that’s how the team solved it. If I ever get too bored I might fork Chainlink and make it work with native tokens only.
  • An issue of this poc design is that the last Oracle pays for everything: calculates the aggregation of results and fires the loaded user transaction. An improvement could be made where the executor node gets paid extra.
  • Scalability issue: increasing the number of nodes linearly increases the number of transactions. N requests, N answers.. Moreover, aggregating a bunch of results and firing the final user transaction in one go can easily bump into the gas limit.
  • Chainlink network doesn’t feel like a “network”. Nodes are standalone, do not coordinate and do not see each other at all. They are just offering simple modular redundancy, nothing else. It is completely up to the users how they want to rely on this.
  • All public facing nodes would have to somehow publish the Jobs they are offering and how they accept the requests. There is no service discovery. You have to tell a friend, or enlist on a service market e.g.: https://market.link/.
  • The other thing that I’m missing is a wholesome incentive scheme. Oracles are getting paid for work, but they are not punished if they do not fulfil their duties or are proven malicious. A staking/slashing system would be interesting to consider here. Modular redundancy helps, but you’d get stronger guarantees with something like this. There are already some ongoing efforts that you can read about here.

When would you use Chainlink?

Smart contracts need to be poked from outside, so most DApps have off-chain components and workers. Why would you use Chainlink then? Why would you aggregate on-chain?

First of all you want to put things on-chain that are mean to be public and disputable. This means that Chainlink comes handy when:

  • You want to fetch data from a public, third party API, so publicly verifiable information. Think of price pairs, weather, scores, etc.
  • You want to show an undeniable proof of honesty to your users: what data you fetched, when did you fetch it, and how you calculated the outcome. All this with an immutable, historical trace. You need to prove you honesty for the preservation of trust.
  • You want to spread risk. Airplanes have 3 of everything from different manufacturers. You can achieve the same with Chainlink’s modular redundancy and aggregate results of different providers.

Second, the UX of Chainlink is amazing (or should I say DX for developer experience?). You can setup one node/Oracle in minutes. If your DApp consists mostly of smart contracts why would you bother developing off-chain components for simple things? You get one out of the box.

The end

I hope you enjoyed this journey with Chainlink on our chain. I keep my nodes running on AWS so the PriceAggregator is quasi public infrastructure on Volta, you can rely on it for your pocs. Not production grade though. If you have some cool addition or different Aggregators, feel free to create a PR to my GitHub repo. I plan to share some other cool things I’m involved with in the future, so stay tuned!

Navigation

← Part 2.: Running your Jobs
← Back to Introduction

Need help with your blockchain based project? Shoot me a mail: adam.zsolt.nagy@gmail.com
Did you learn something useful? Made your job easier? Tips are welcome too: Ethereum: 0x74dd76E24B2CFB43C1b1a4498295d553D0843746

--

--