Subgraph Development, Part 2: Handling Arrays and Identifying Entities

Protofire.io
Protofire Blog
Published in
6 min readJan 13, 2021

Part 1 | Part 2

This blog post provides practical recommendations on how to use arrays efficiently and generate entity IDs that are unique and referenceable.

In part 1 of this series, we provided an overview of subgraphs to help developers understand their basic structure. In addition, we also shared our insights for mapping and aggregating data.

This time, we will discuss another two topics: handling arrays and generating IDs for entities that are both unique and easily referenced. As part of the discussion, we will provide our recommendations on how to efficiently manage arrays and properly name entities.

Handling arrays efficiently

Adding arrays to entities is useful in certain scenarios. For instance, there are cases where we need to model a list of addresses for a data source or track historical changes for a particular field over time.

Without prior knowledge of how arrays work within subgraphs, we could consider creating a field of an array type on an entity type definition (within the schema.graphql file) and initializing an empty array whenever a new entity of the same type is created. When new data is added to the array, we can push the data and save the entity. While this sounds intuitive, it unfortunately does not work.

Manually handling arrays on subgraphs, specifically in the scenario above, has a few caveats. Whenever you access an array of an entity, what you actually get is a copy of the array. Thus, if you add new data and save the entity, it will not work as you would expect, since you are simply modifying a copy of the array, while the original is left unchanged.

In order to update the actual array, we can place the copy of the array in a variable and then modify the data. Next, we can set the variable as the new array on the entity. This way, the old array is replaced by the copy. This process of updating the array is exemplified in the following code.

// This won't work
entity.numbers.push(BigInt.fromI32(1))
entity.save()
// This will work
let numbers = entity.numbers
numbers.push(BigInt.fromI32(1))
entity.numbers = numbers
entity.save()

While you can update an array in the manner demonstrated above, it is not an ideal solution. Besides being inconvenient, there is another reason not to manually handle arrays — time-travel queries. (Read part 1 of the series to learn more about time-travel queries.)

It is only possible to perform time-travel queries, because subgraphs keep track of all the changes in all the entities present all the time. If there are a lot of entities with array fields, which are large and updated often, copies of all the arrays will also need to be stored. This will take a toll on the performance and disk space of any indexer that is indexing your subgraph.

Currently, The Graph’s hosted service is the only active indexer available. In the future, more indexers can join with the addition of The Graph’s decentralized network. These new indexers will be able to choose which subgraphs to index. If your subgraph is poorly optimized because of arrays, it is likely not going to be picked up by any indexer.

To optimize our arrays, we can use the @derivedFrom annotation. This method allows any array field defined in an entity type to be automatically filled by all entities of the specified type linked to the entity we are defining. The following example depicts the usage of the @derivedFrom annotation.

type User @entity {
id: ID!
positions: [Position!]! @derivedFrom(field: “user”)
}
type Position @entity {
id: ID!
user: User! # This is the ID String of the User
}

In the example above, we have a user with an automatically generated list of the Position entities. Whenever our subgraph receives a query asking for the positions field of the User entity, the subgraph performs a reverse lookup for all the Position type entities linked to the specific User entity on their user field. In this manner, the linked entities are those having the string ID of other entities in one of its fields.

Using the @derivedFrom annotation, we can define the entity type that we want for our array data, define the field used when deriving the array, and link it to the original entity via their ID. There is also the benefit of being able to add more data (e.g., creation or update metadata) to the entities representing the array data. Since these are fully fledged entities, we can update them easily by loading their IDs instead of looking them up in the array.

While handling arrays with the @derivedFrom annotation is easier, there are still some considerations to be aware of. First, it will only work with one-to-many relationships. In many-to-many relationships, we still need one side of the equation to manually handle the array. Second, you will not be able to access the array data, while the subgraph is being indexed, since the array is populated when queried.

Creating a naming convention for entity IDs

All the entities defined in the schema.graphql file are identified by an ID field that is declared as an ID! type represented as a string. The ID field is important as it is used to load, create, and save entities.

Since the ID field is the primary means of identifying an entity, it should always be unique. That said, guaranteeing the uniqueness of an ID is not difficult. Data present during index time can be combined to generate unique IDs. The following code is an example of this.

event.transaction.hash.toHex() + "-" + 
event.logIndex.toString()

By taking the transaction hash of an event (unique for different transactions) and appending it to the log index for the particular event (which identifies an event within a transaction), we can generate a unique compound ID. This way, we can identify a particular entity among other entities of the same type provided only a single entity is created for any single event. If needed, we can also append more data to uniquely identify any number of entities created in the same event. For example, we could set a counter for each time an entity is created and append the value to the newly created entity.

While having an easy method for generating unique IDs for our entities is convenient, we should also strive to generate IDs that are predictable and can be referenced. If we have entities related to a part of our domain that is likely to be queried by end users via their ID, we can generate an ID that references the domain we are working on.

As an example, consider a scenario where we are creating an Account entity on a DEX subgraph. This Account entity will store the user’s balance, as well as other information. If we create the entity ID based on the transaction hash, the user could search for the transaction that created it in the first place and recreate it, but it won’t be intuitive. A better alternative would be to create an ID based on the user’s Ethereum address, and, if needed, combine that with something else relevant to the domain. This way, we can uniquely identify a particular user account from other accounts of the same user.

In summary, generic unique IDs without any domain specific data can be useful for entities that won’t be constantly updated. This is ideal for entities created to save metadata for domain specific events that will be consumed from a derived array on a main entity. For instance, generic unique IDs are better suited for transfers, mints, burns, and swaps.

On the other hand, domain specific IDs are ideal for main entities and any other entity that will get frequent updates. You are likely to be using a combination of an Ethereum address and some other domain specific IDs. In most cases, a smart contract will generate unique IDs and will log them on the events. If this is not the case, you will need to study the smart contract and identify what makes your entity unique and use that data to generate an ID.

As a side note, the toHex() and toHexString() methods — commonly used to generate IDs out of addresses or hashes — return a lowercase string. This means, when you query a subgraph for entities, the ID string provided should be lowercase as the query is case-sensitive.

For more information about subgraph development, please check out The Graph’s official documentation. Additional details can also be found in the project’s GitHub repository. The Graph also has an active and growing community ready to help and answer the arising questions. We encourage anyone interested in developing their own subgraphs to join the Graph’s Discord server.

--

--

Protofire.io
Protofire Blog

We help token-based startups with protocol & smart contract engineering, high-performance trusted data feeds (oracles), and awesome developer tools (SDKs/APIs).