Exploring RedStone Oracle, Part III : Data Aggregation

7 min readJan 23, 2024

We’ve reached part III of this miniseries, which will conclude the exploration of major key points of RedStone !

Part I : The modular design of RedStone
Part II : The three methods of integration
Part III : Data aggregation (We are here !)

After understanding what was a modular design, the purpose of an Oracle, how RedStone was handling it all, we’re now reaching the last part : Data aggregation.

You probably already read it many times in the previous part, without the full term being laid out. Data aggregation is crucial not only for RedStone, but many applications in web3 : that’s why we will unveil it all, today !

What is Data Aggregation ?

Data aggregation is the process of compiling data (often from multiple data sources) to provide high-level summary information that can be used for statistical analysis. An example of a simple data aggregation is finding the sum of the sales in a particular product category for each region you operate in. This statistical summary makes it easier to analyze large volumes of data. Common data aggregations include sum, average, max, min, and count.
https://www.alteryx.com/fr/glossary/data-aggregation#:~:text=Data%20aggregation%20is%20the%20process,each%20region%20you%20operate%20in.

Aggregation is a vast word that can be linked to multiples domain. To name a few : business and economics, computer science, statistics, etc.

However, there’s a main idea between it all : merging a multitude of data into one that sum it all up, in order to form a concrete matter. It will be easier to manipulate, depending in which fields it will be used later on.

Let’s take an example, by taking the example of an “Aggregate function” :

In database management, an aggregate function or aggregation function is a function where multiple values are processed together to form a single summary statistic.
https://en.wikipedia.org/wiki/Aggregate_function

In database management, doing a median of all the data can be pretty useful. Let’s take this simple example :

Imagine if the database has a stock of more than a billion-person in it, alongside different salaries, and their personal information such as location, gender, age, etc. We could summarize it all by reading all the salaries in the database, and create a median value of all the salaries thanks to an aggregation function. This function will be able to even select more criteria if needed, like gender or which type of field this person is working on.

Example of object composition / aggregation

Let’s take another example of aggregation, named “object composition” :

In computer science, object composition and object aggregation are closely related ways to combine objects or data types into more complex ones
https://en.wikipedia.org/wiki/Object_composition#Aggregation

Objects in computer science are defined as products that are tangible in our world, but transformed into something that’s useable in a program. We can then combine all those objects to form another one, by writing what we call “properties” and merging the “properties” that are similar to each others.

Despite different use cases, the goal remains almost the same. They don’t always converge on the same goal, but they do have another purpose when merged altogether.

Now, what will happen if we don’t process the data correctly before saving it on-chain on a blockchain ? We risk having an error or a deviation of the correctness of the data. Which is why a decentralized Oracle or multiples Oracles could help to solve, or at least reduce, this problem.

The importance of data aggregation in blockchains

The blockchain have to be almost certain that the data will be clean, and useable. This can become quickly difficult if the data is processed badly, or one part of the chain that’s processing it goes down : the data integrity won’t be assured.

Furthermore, like we saw in the previous part, blockchains are cut off by themselves from off-chain data. They have to rely entirely on external services, so choosing one that process the data correctly is essential. An Oracle was the solution like we named it in order to receive data from external sources, while processing it correctly.

For every blockchain, deciding on multiples Oracles or a single Oracle like RedStone that assure a modular design while decentralized, is required. Moreover, the risk of it going down is almost null. Why ? The relayers and data comes from multiples sources, which means if a few sources stops working, it doesn’t really matter in the grand scheme of thing : data availability will be assured since all sources doesn’t rely on each others.

However, relying on several Oracles can also become quickly annoying, if they don’t work well together. It increases the chance of data corruption, so a modular design assure the provenance of a multitude of sources and layers of data with only program in mind, which is RedStone in this case.

The quality of data delivered by an oracle service depends on two main criteria:
Data availability — which means that the oracle data should be always available for end users (or smart contracts) and should be updated with the promised frequency
Data correctness — it may be defined in different ways and usually depends on the type of data. E.g. correctness of objective data (like results of the given football match) can be easily verified, but with less-objective data (like ETH token price denominated in USD) it can be way more difficult to define correctness
https://blog.redstone.finance/2022/08/17/what-you-must-know-about-data-agregation-and-its-role-in-blockchain-oracles/

How is RedStone processing the data?

Based on the previous part, we saw that RedStone has a large source of data feeds coming from multiples DEX’es and aggregators.

RedStone is using various methodologies for aggregating the data, which are named “median, TWAP (Time-weighted Average Price), LWAP(Liquidity-Weighted Average Price)”.

Median price value

There is another approach, which uses a median value calculation. It’s way better than the average value and definitely more resistant to manipulations by corrupted sources. However, even this method cannot be considered a perfect way to calculate price value. As an example, assume that you take the same ETH/USD value from one large crypto exchange ($100m daily trading volume on ETH/USD market) and 4 small ones (~$10k daily trading volume on ETH/USD market), and the large exchange provides value $2000, but all the small ones — less than $1900. Then the aggregated median value, in this case, will be less than $1900, but, as you can guess, it’s not close enough to the “real” market value.
https://blog.redstone.finance/2022/08/17/what-you-must-know-about-data-agregation-and-its-role-in-blockchain-oracles/

Just mentioned above, it’s impossible to have a 100% correctness at all time, but the goal isn’t to be entirely perfect : it’s to have the most accurate data possible.

In RedStone defense, this could become less problematic as data sources usually comes from a way larger standpoint, which means that one element of the chain that would be wrong would be corrected by the others in most cases. But, like they said, this isn’t 100% free risk. A token could also not be available on many relayers when released, as an example, which will reduce the data sources to pick the price from.

Others uses cases for RedStone graphs are available on this article. What’s important to understand is that the modularity of RedStone is even present on data aggregation, and can be adapted to many scenarios. That way, RedStone can adapt to the need of the protocol, and chose the best scenario to handle. The whole process is called “On-chain aggregation” and assure that the data is processed before being sent to the consumer contract. The default process that is used is median.

This is the end !

As we’ve covered all those subjects in this miniseries, I hope those examples and use cases helped you to understand better the goal of an Oracle in blockchains and web3 applications. Understanding how difficult working on such a product can be, you should now understand why RedStone is providing excellent usecases in the Oracle field !

Thank you for reading it all until the end ! If one information is deemed incorrect, don’t hesitate to leave a comment under this article.

Exploring RedStone Oracle, Part III : Data Aggregation

What is Data Aggregation ?

The importance of data aggregation in blockchains

How is RedStone processing the data?

Median price value

This is the end !

Links

Exploring RedStone Oracle, Part II : The three methods of integration

The RedStone miniseries, “Exploring RedStone Oracle”, pursue !

Written by Charlotte Kindt