Data is not the new oil

SQD (previously Subsquid)
SQD Blog
Published in
5 min readSep 12, 2024

For some time, “data is the new oil” was a slogan crypto data companies used to appeal to VCs and convince them that there were enormous profits to be made. This throws up a few questions:

  • Why would you want to associate with leaks in the ocean killing birds?
  • How are VCs making money from you empowering users to own their data and not allowing Big Tech to exploit it for advertising revenue?
  • Is it, though?

We cannot and won’t deal with the first two questions to set expectations straight. It’s something to ask the people who made these claims. For SQD, the more critical question is data’s role in a web increasingly navigated through the lens of A.I. agents.

One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. […] Samsa was a traveling salesman — but the only places he traveled were the roads of 0s and 1s. “Whatever”, he thought and tapped the screen of his phone with one of his legs, opening the interface for Alexis. “Hey Alexis, please pull up my most recent sales conversation and all relevant research notes I made,” he instructed the artificial assistant. With the other arm (or was it a leg?), he lifted the glass of water from his nightstand and took a sip. Gregor Samsa took his new state in with a nonchalance only found in modern-day nihilists. Shifting through his notes while scrabbling with one leg to fetch a sock off the floor only to realize he wouldn’t need it, it crossed his mind that maybe having so many legs (or arms) would give him an advantage in navigating the world of 0s and 1s. After all, he could now type on three devices at a time…

It’s pure speculation, but had Gregor lived a few centuries later with RAG and LLMs fully established; he might never have gone through the painful process of hiding the metamorphosis. He could have just stayed home and still accomplished his work.

But how far are we from this scenario? In a recent talk, our CEO, Dima Zhelezov, shared his thoughts on AI, RAG, and the role SQD will play in it. Listen to it here or read on for an overview.

AI & RAG are here to stay.

Let’s face it: regardless of whether you fall in the d/acc or e/acc camp, the underlying belief is that technology is here to stay and is not going anywhere. The same applies to Large Language Models, which quickly gained adoption by anyone from LinkedIn influencers looking to out-cringe each other to housewives (and husbands) just trying to figure out what to cook with their random assortment of leftovers.

AI agents are convenient and alleviate the mental load of their users. In the context of crypto, such agents take on- or off-chain actions on behalf of their users.

The way this works is that they consume data from a web3 data source, further data provided by the user, and then execute actions without the user needing to do anything. An easy example would be maintaining a certain portfolio of DeFi assets. A user can specify the allocation to Top 10 coins, and the agent will ensure that the ratios are maintained throughout market cycles selling and buying as necessary.

One way to think of these types of agents is as “trading bots on steroids.”

Further use cases include governance, other DAO activities, and automated trading. Adding RAG takes this a step further.

What’s RAG?

RAG is short for Retrieval-Augmented-Generation and describes a way to fill the gaps of existing LLMs.

Have you ever wondered where Chat-GPT got its answers from or what it means when it answers in a non-sense paragraph? Obviously, unlike humans, GPT-4 relies on statistics to combine words without understanding their actual meaning. And it’s limited to the data it was trained on.

Not anymore.

RAG adds access to current information while allowing users to peek behind the curtain and understand where an answer came from. The benefits are obvious: increased transparency on the sources, enhanced factual accuracy, improved consistency, adaptability, and coherence.

Take Github Co-Pilot. It doesn’t just rely on the data it’s been trained on; it also accesses commits, issues, and responses from GitHub relevant to a developer’s project to provide more relevant responses. Similarly, you could imagine an organization super-charging its support system with a bot that has access to all historical and current open tickets.

For crypto, RAG presents a new way to shape user experiences by leveraging AI. Instead of requiring users to navigate the kafkaesque labyrinth of bridges and chains, they could simply say something like “Send $200 USDC to my friend Gregor.” It’d be accomplished, even if the funds in their wallet are on Arbitrum, and Gregor only accepts TRC-20 tokens. The system could even look up the best times to transfer funds on given chains to optimize for gas fees and so on.

The User interface could even be a voice interface where tasks run in the background, simplifying all Web3 interactions.

SQD CEO Dmitry foresees LLMs in crypto could be leveraged this way for on-demand risk assessments (taking in relevant on- and off-chain data), human-intent-centric UI, data visualizations, and analytics, among other use cases.

Does that mean we should all now train LLMs?

If you have deep pockets, maybe.

However, you might want to consider our CEO’s take.

LLMs will become a commodity; Data won’t

Data isn’t the new oil; it’s something different. He explains that, eventually, AI interactions will become cheap while data explodes. At the same time, chances are that data retrieval won’t drop as much in price, and on top of that, blockchain’s size continues to grow with every block.

Before you go ahead and build your business entirely on AI agents, you might want to consider the cost of retrieving the terabytes of data you’ll need.

Our solution to the onchain data retrieval dilemma is modular indexing. SQD supports bulk extraction and filtering to the instruction level for more than 150 blockchains, including leading L1s, L2s, and native VMs.

It’s fast and scalable at a significantly lower cost than the competition while providing the dev tools necessary to power any AI and non-AI blockchain data use case you can think of.

Whether you want to facilitate future Gregors’ Hikikomori lifestyle or simply build better ways to interact with Web3, get in touch if you need data to accomplish that.

--

--

SQD (previously Subsquid)
SQD Blog

The Web3 Data Layer Powering devs with access to data on over 150+ chains, including EVM, SVM, Substrate and more. https://sqd.dev/