Subgraphs demystified

Published in

Beethoven X

6 min readMar 10, 2023

As soon as you involve yourself only a little bit with the tech in a DeFi project, you will come across the term ‘subgraph’. Over time, I have come to realize that for many DeFi enthusiasts, the subgraph is some magical entity capable of doing anything.

What I want to try, is to break down the concept of subgraphs for non-technical people to get a better understanding of what a subgraph is, how it works and what it can be used for. Before I start with this, I want to say that I only use subgraphs in the context of Beethoven-X. Although subgraphs are commonly associated with DeFi use-cases, they have the potential to be applied in other areas as well. In this article, I will focus on my experiences using and learning about subgraphs specifically in the context of Beethoven-X.

Where are subgraphs coming from?

The subgraph is a product from https://thegraph.com/en/. On their page they describe it as follows:

The Graph is an indexing protocol for querying networks like Ethereum and IPFS. Anyone can build and publish open APIs, called subgraphs, making data easily accessible.

This means, subgraphs are basically APIs. API stands for Application Programming Interface which is a fancy way of saying that you can query desired data utilizing code. This means that a developer or a web3 dApp can use programming code to interact with the API and retrieve the data they need. So subgraphs offer an interface to query data, and to be more specific, block chain data.

By the definition above from The Graph, only the API is called “subgraph”. However, it’s important to note that the entire system also includes and is dependent on an indexer and a database. Often, the entire system is called “the subgraph” and we will use this terminology here as well.

It’s all about data

The description of subgraphs from The Graph says, that they index data. In other words, they gather and store data. The question is, what data?

To answer this, one must know something critical about smart contracts: They can emit events. When you write a smart contract, you can choose to emit an event that also contains some data. For example, the Reliquary contract includes the following line of code in the harvest() function:

emit ReliquaryEvents.Harvest(poolId, _pendingReward, harvestTo, relicId);

When a user harvests rewards for a relic, an event is emitted with data referencing the specific Reliquary pool harvested, the amount harvested, the recipient, and the relic ID number. Most contracts will emit such events in many different shapes or forms. It’s these events, and the data they contain, that are consumed and stored by the subgraph and can then be made accessible through an API afterwards.

Create a subgraph

Once you understand that you can use events from smart contracts and their data, you can start to think about which problems the subgraph can solve and which problems it can’t solve.

For example, if no event would be emitted when a user harvests the rewards for a relic, you can’t trigger any action on the subgraph. But since there is an event emitted when someone harvests, we can actually build something with it. Let’s say we would like to find out which relic and user harvested how many rewards in total. You won’t be able to query this directly from the smart contract, therefore you’ll need to aggregate the data in a subgraph.

We will not talk about all the technicalities of writing a subgraph and will omit a lot of code. If you would like to write a subgraph yourself, check out this tutorial on thegraph.com.

The code snippet in the subgraph that aggregates all the harvested rewards for a particular user would look something like this:

export function harvest(event: Harvest): void {
  const params = event.params;
  const user = getOrCreateUser(params.to);
  user.totalHarvestAmount = user.totalHarvestAmount.plus(params.amount);
  user.save();
}

What happens is that each time a harvest event is emitted, this function is called on the subgraph that extracts both the user from the to parameter of the event as well as the amount. The amount is added to the user’s totalHarvestAmount and saved into the database.

Inside such a function, you can not only access the data from the event but also any data that you have previously stored from other events. This provides a comprehensive record of the activity over time. Additionally, you can make on-chain calls to query even more data.

Query data from the subgraph

This is where the subgraph actually got its name from. The subgraph exposes a GraphQL API to query the data. GraphQL is a query language for APIs that is used among many different use cases and technologies and is not unique to subgraphs.

In a GraphQL API, one always needs to define a so-called schema. For our example above, it would look as simple as:

type User @entity(immutable: true) {
  id: Bytes!
  address: Bytes!
  totalHarvestAmount: BigDecimal!
}

This schema can now be queried through the GraphQL API. The Graph provides you with filtering and sorting out of the box, which is a real value-add. For example, you can now query all users that have harvested more than 1000 reward tokens and sort them by address. That would look like this:

{
  users(
    where: {totalHarvestAmount_gt: 1000},
    orderBy: address,
    orderDirection: asc
  ){
    address
    totalHarvestAmount
  }
}

Or you could even query the data at a specific block number by using {block: {number: 123456}} in the where field.

You basically tell the API which fields you want to display ( address and totalHarvestAmount) and how you want to filter (where) and sort (orderBy) the dataset of all users.

Where does it run?

Since this is not only a piece of software but actually an entire system, a subgraph needs several components:

A server or compute engine to run on
Access to an RPC node to listen for the events
A database to store all the data

The Graph offers the hosted-service that includes all of it but which will unfortunately be sunset in a staged approach. The big advantage of this service is that you don’t need to worry about running anything yourself. You can simply deploy your subgraph code and the best part, it’s free. These are the main reasons so many projects use it.

While the hosted-service is being sunset, The Graph offers a new product called the decentralized network. Here you can deploy you subgraph in a decentralized fashion but it is no longer free to query your subgraph. Slowly but surely all subgraphs from the hosted-service will need to migrate to the decentralized network.

Another option is to run everything yourself. I did this for quite some time and hosted it on AWS. The advantage is that you have everything under control and the performance is much better. The downside is that you need to take care of the three components mentioned above yourself. This is not only time consuming and needs a certain skillset but also comes with a cost.

Subgraph alternatives

There are also alternatives to subgraphs but I must admit that I don’t have any practical experience with any of these and therefore can’t really offer much insight. Nevertheless I find it important to touch on these.

As we learned before, the subgraph is an API that let’s you query aggregated blockchain data. There are paid services out there, that let you achieve a very similar thing but with a different approach. For example, https://www.covalenthq.com/ or https://fura.org/.

Another alternative that sounds very promising and that you can run yourself is https://substreams.streamingfast.io/. Unfortunately it is not available for Fantom and I therefore never took the time to look into it.

I hope this article made the entire concept of the “subgraph” easier to understand for both technical and non-technical users. If you have any further questions or suggestions how to improve this article, don’t hesitate to reach out on the Beethoven-X Discord.