Extracting Value From On-Chain Data

Nick Gardner
8 min readOct 13, 2022

--

The utilization of data has been a popular concept for many years, specifically among engineers and application developers. This is because the quality and movement of data in a well-architected system allows applications to run at low cost and optimize for what matters most to its users.

Users care about performance.

All successful web-based applications are driven by performance. Users demand it because reliability and speed enrich their experience. Businesses focus on performance because it’s easy to build a strategy around things that are stable in time.

Over the past 5 years, crypto has ushered in millions of new users, and the growing interest for more complex use cases has accelerated the industry’s obsession with performance. Most of the dialogue today around scaling blockchains is about writes. This can be measured in transactions per second (TPS), with considerations around latency and the cost of block space. What is often overlooked, however, is reading blockchain data. How do we effectively pull on-chain information and make it useful?

This article provides an overview of indexers and the increasingly important role they play as blockchains grow and become more data rich.

Indexing

What is indexing?

“Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed” — Stately World

This definition is a great place to start, but let’s simplify it.

Indexing increases database performance. This entails reordering data in such a way that its contents can be retrieved accurately, with speed, and at a low and predictable cost.

In a centralized database, indexing is used to quickly locate data without having to search every row in every column when access to information is requested; this is called a query.

Web2 developers who deal with relational databases have probably used or at least heard of this concept before. The single most important part to understand when working with centralized databases is what to index and how it is going to boost query (data retrieval) response time.

The Centralized Database

Centralized databases are historically used as structures for storing information. They started out as flat file hierarchical systems, similar to filing cabinets, before evolving to support much more complex use cases. These use cases include querying data to gather insights for structured reporting, financial decision-making, and, most importantly, web-based applications.

We’ve relied on centralized databases for data storage and access for decades, but the two main reasons aren’t so obvious.

1. They don’t need to be time-sequenced

2. The different state updates do not need to be tracked

That is because all of the data can be modified, managed, and controlled by a single user or entity for their desired purpose.

While these systems are internally optimized to meet their performance requirements, a lot depends on indexing and the application developers themselves. This is because only a developer knows what queries their application has to perform.

The only requirement is that the database makes the data accessible to the software applications that request it.

Financial ledger, not a database

A blockchain is a type of data structure that has some new properties. It’s a way to store information that anyone can access. But the way blockchains are designed is fundamentally different from the former. Instead of a single entity managing and modifying the data, nodes around the world validate and store the complete history of transactions on the network. One benefit of this design is that all the data is publicly available and the cost of verifying the state is distributed.

This enables three strong properties:

  1. Decentralization— Eliminates single points of failure and allows truly serverless applications
  2. Inclusive accountability- Users can verify what they are seeing is correct
  3. Governance- Includes a mechanism to change the rules of the system

In theory, blockchains are designed to provide a global market for sharing and supplying data and computing resources without requiring an organization to maintain them.

In practice, this is not what is happening.

Problem 1

When people talk about blockchains, they commonly refer to the trust-minimization properties, leaderless consensus, and the relationship between users and network operators. Blockchains are designed to be a network of peers (operators), but they are not designed in such a way that it’s really possible for everyone to be one of those peers.

It’s surprising that so many resources and time has gone into creating networks with trust minimized properties, yet virtually all users that wish to interact with them through dApps do so by trusting the outputs of Alchemy and Infura. These two companies have achieved this market dominance by building their own enhanced APIs and custom indexing tools that scrape real-time and historical transactions before storing them in a data center.

This looks familiar…

Problem 2

Despite the prevailing sentiment that all blockchain data is “open and available to use”, most applications and users are unaware of the nuance between having access to a bunch of transaction data and being able to extract value from it. What many don’t understand is that the structure of blockchain data is designed for consensus, mainly ordering, and not analytics.

Due to the time-ordered structure of on-chain data, transactions are linked sequentially across numerous blocks, and there is no built-in mechanism to identify, categorize, or query that information. Simply put, they lack the ability to index. This means the majority of today’s dApps are unable to derive semantic value from the platforms they are built on because they can’t perform real-time multi-chain analysis for assets and other protocols.

Without a trust-minimized solution for reads:

  1. Blockchains fall into the centralized trappings of the past.
  2. On-chain data remains public in theory but underutilized in practice.

Superchain solves these two problems.

Superchain

At its core, Superchain is an indexing protocol that organizes on-chain data and allows dApp developers and users to extract value from it.

The protocol is purpose-built and enables four key features:

  1. The ability for users to become network operators
  2. Access to historical and real-time low-latency data
  3. Advanced analytics to improve utilization
  4. Customizable toolboxes for application-specific use

Instead of packaging all of these components into a Web2 API service, Superchain has built a Web3 business model for Web3 data. It is a crypto-native solution for on-chain analytics.

For traders to gather insights from transactions and for developers building applications to get the data they request, it must first be organized using a custom indexing solution. Current offerings in the market are slow, difficult to use, and not optimized to support specific DeFi use cases.

This chart is a comparison of the latency fetch price of current solutions in the market.

Data Access Re-Thought

Instead of all on-chain information being indexed and controlled by a few entities, this process can be carried out by operators of the network.

  • This structure localizes the queries, resulting in dramatically increased performance.
  • Operators are paid to produce indices and data streams for users. Streams are modeled with toolboxes, and developers earn royalties for toolbox usage.

The Superchain toolboxes allow other developers and traders to interact with indexed data via an SDK. This avoids using slow presentation layers like REST and GraphQL and reduces the iteration time between building, testing, and releasing from weeks to minutes.

Superchain’s data is low latency, combines cross-chain, and can stream analytics in real time. The level of granularity that is accessible opens up entirely new strategies and enriches the developer experience.

Instead of averaging out the market price of USDC/WETH over 10-second intervals, Superchain is able to retrieve the last traded price, allowing traders to better utilize volatility.

Purpose Built Technology

One of the fundamental flaws of current infrastructure and tooling is that, to make sense of on-chain data, users must look to Web2-hosted service providers that charge per query and own all of the data. If all the interesting properties of blockchains emerge from trust minimization, then it doesn’t make a lot of sense for Alchemy and Infura to have visibility into almost all the read requests from nearly all users in almost all dApps. This also doesn’t seem like the best privacy solution.

How does Superchain fix this?

Well, it’s built differently.

Instead of applying an old centralized database service model to new internet infrastructure, Superchain has built its own specialized database with customized storage. The protocol unifies data into a consistent schema rather than simply aggregating it. This allows the data to be deterministic, distributed and replicated horizontally.

Some use cases include:

Application Developers can now perform real-time multichain analysis for other assets and protocols. Wallet users will have the granularity and speed to make better-informed investment decisions. TradFi institutions looking to form a DeFi strategy will have more efficient market access.

Conclusion

All innovation in data can be boiled down to figuring out new ways to store more information and decreasing the time and cost to access it.

Over the past 5 years, on-chain data has been public in theory but underutilized in practice. The growing interest for specific and more complex use-cases has led to a new generation of custom indexers, redefining the blockchain data landscape.

As layer 2 solutions like optimistic- and zk-rollups continue to grow in usage while at the same time high-throughput chains like Celestia, Aptos, and Sui come online, the amount of writing to individual chains is going to explode. Superchain’s indexing solution presents a novel way to scale the reading of blockchain data. By incentivizing economical-driven users to perform the functions of a centralized database (reads & queries) the supply side of the network will be able to service the exponential increase in read demand.

--

--