Create your Graph node to query complex data from blockchain via GraphQL

Don’t waste time with a custom API.

Kevin Thizy
InTech
8 min readAug 5, 2019

--

I’m a software Engineer working for InTech, a Luxembourgish IT company providing consulting and expertise in information technology and systems, management and project development.

Let’s face it, fetching information from the blockchain is sometimes tedious. Complex queries often takes multiple calls to the RPC endpoint, or worse, replay events locally.

As an example, and this is the use-case that will be my guideline, take Blockchain Identities. I encourage you to read more about Blockchain Identities by searching for the ERC735 standard proposal. A more complete overview and implementation is accessible with the investorID standard https://investorid.org I have participated to.

To summarize, let’s say an Identity is a Smart Contract that can store claims that are proof of a verification process (for instance, the proof that the holder of the Identity is Luxembourgish). Let’s try to answer a very simple question:

“What are all the claims the Identity has?”

Elementary, right?

Maybe not as easy at it seems. Most Smart Contracts won’t use an array to store a list, but mappings, and maintaining an additional array of elements to create a method that returns them makes the smart contract execution more costly. Usually, however, functions that create or remove elements in the mapping emits events such as ElementAdded and ElementRemoved. To retrieve all elements, you would fetch the logs of these events to rebuild the current array of elements.

But even if there was in our Identity contract a .getClaims() method (which is not the case in the InvestorID standard contracts for diverse reasons, including performance and costs). We now want to answer this question:

“For each of these claims, it isn’t self-attested, how many other claims has the claim issuer issued?”

You probably don’t have an easy solution to this one, that works will all variety of Claim Issuers contracts, that, for most of them, don’t store the list of claims they have issued. You could fetch all ClaimAdded events and filter the event content by the Claim Issuer contract address. Yet we would have to do it for each claim… What if we wanted to execute this operation on several identities? We could implement a server that listen for blockchain events and build these relations in a cache database, and have endpoints to retrieve them.

But what if we could retrieve all this information with a single request as simple as:

One request to rule them all…

And there might be tools to bring this capabilities in a few lines of code.

Now that I have your attention, let me introduce you to GraphQL and The Graph.

GraphQL

The official GraphQL website defines GraphQL as follows:

“GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.”

Its main advantage is to be able to “traverse” entities (this is exactly the use-case I have explained above when I wanted to retrieve the claims issued by the Issuers of the claims on an Identity). But it also minimizes the amount of data that is sent over the network, allowing for faster, safer and cheaper communication.

Today, blockchain calls for events return for every all the complete set of information for a given log. When you are fetching hundreds of events, you are wasting a lot of bandwith, and charge the memory of your machine (browser?) with data you won’t use.

In its 1.9.0 release, GEth, the ethereum GO node has added a new endpoint for accessing data. In addition to the JSON RPC protocol, GEth is now supporting GraphQL queries. The node is only sending back the fields the application is requesting. GraphQL endpoint also supports subscription via websockets, and has nothing to envy to its RPC brother.

Yet, you probably don’t wan’t to run a full local Ethereum node (and wait for it to synchronize). Moreover the node don’t allow us to traverse entities like we would like to do. That’s because it has no knowledge of the Entities represented by and used by the Smart Contracts.

This is where The Graph protocol comes.

The Graph

The Graph official documentation states that:

“The Graph is a decentralized protocol for indexing and querying data from blockchains, starting with Ethereum. It makes it possible to query data that is difficult to query directly.”

The main advantage of The Graph is that it knows about the entities it indexes. It is composed of subgraphs, crafted with love by developers of the community, ideally those who created the smart contracts the subgraph will analyze.

There is a global instance of The Graph that accepts subgraph definition, but as we can’t foresee the future or the business model that will be adopter for querying The Graph, and to demonstrate that it’s possible to do so, we will run our own instance.

Our first SubGraph

Basic setup: Ethereum network and Graph instance

The Graph isn’t an Ethereum node, so it needs one to fetch data from. For this example, I’ll start a local node and blockchain using Ganache. Of course, you could use the ropsten network or the mainnet by connecting to another node, or to Infura.

First step, install Ganache and run a local workspace. Make sure to note the RPC server address somewhere.

Install Ganache and start local blockchain.

Then, let’s run our local instance of The Graph. There are many installation methods, but I like simplicity, therefore I’ll go for the docker-compose setup:

  • Clone the graph-node repository: git clone https://github.com/graphprotocol/graph-node.git
  • cd docker
  • Should your RPC Server endpoint be different than http://127.0.0.1:8545 , update the environment variable for docker-compose.yaml
  • docker-compose up 🎉 (it requires for the ports 4001, 5001, 8000, 8001, 8020, 8080 and 5432 to be open as it will start a postgre sql, an ipfs local node and The Graph node itself.)

Our Graph instance is now running, and should be ready to accept our subgraphs!

Define our first subgraph

  • Let’s create a new node project. I recommend to use yarn, but npm is equivalent: yarn init
  • Set up the package.json scripts. (Note, graph-cli might be required depending on your operating system and node configuration. If scripts commands don't work, try installing yarn global add @graphprotocol/graph-cli.)

Update the domain/graph-name part to mirror the name of the SubGraph, such as investorid/id.

  • Install these dependencies: yarn add --dev @graphprotocol/graph-cli @graphprotocol/graph-ts
  • Load (copy&paste) your contract ABIs (.json files) into a ./abis folder.

Instead, you may install a package containing these abis, like yarn add --dev @investorid/solidity. Which is probably what you should do.

  • Create a subgraph.yaml file to describe the SubGraph:
  • Create a schema.graphql file to describe the entities used by the SubGraph and exposed via GraphQL.

Note the custom types added by The Graph, such as BigInt and Bytes. The Graph’s documentation of GraphQL API explains the usage of relationships and specific reverse properties such as @derivedFrom.

  • Create a file in .src/handlers where the handlers will be implemented mkdir src/handlers/identity.ts.
    Implement the event handlers (let them be no-op for now). Imports for entities and events do not exist yet but will be generated, so ignore any IDE error. The first will come from the schema.graphql file, and the later from the subgraph.yaml definition.
  • Declare the ABI to be used and scanned by the graph node in the subgraph.yaml file.
    Any entity used by the event handlers must be declared in the entities property.
    A list of all event to be scanned and handled must also be declared in the eventHandlers property. The eventproperty must match the exact event signature contained by the ABI (if an event is not recognized an error will be thrown displaying the list of available events to help for correction).
  • Now that some entities are defined, generate the typings and the automated code parts that will allow for developing the event handlers: yarn run codegen.
    The handler file ./src/handlers/identity.ts should no longer have errors for non-existing imports.

Implementation of event handlers

This is where the fun starts. Time has come to create the event handlers. Refer to The Graph documentation, to learn how to write mappings. Here are the very few basics:

  • To create a new entity instance, call <Entity>#create(<id>). The <id> is a string that must be generated. To be able to retrieve instances from blockchain data, the IDs should be composed of addresses, hashes, etc...
  • To load an existing entity, call <Entity>#load(<id>).
  • To save a new entity or persist the update on an existing one, call <entity>.save().
  • To destroy and remove an entity from the store, call store.remove('EntityName', id) .

I have made an example with Identities that you can access on the InvestorID Organization: github.com/investorid/subgraph-experiment.

Here are the events handlers for the Keys event:

Run our subgraph

  • Build the SubGraph with yarn run build
  • Declare the SubGraph on the local Graph node yarn run create-local
  • Deploy the SubGraph to the local Graph node yarn run deploy-local (there is also a watch start that will deploy the SubGraph after each code update yarn run watch-local. All events will be processed again with the new updated handlers.).
  • An GraphQL UI for queries is accessible at http://127.0.0.1:8000/subgraphs/name/domain/graph-name/graphql (replace domain/graph-name by the name of the subgraph). Have fun! Refer to graphQL documentation for queries syntax, especially about filters.

Note: Whenever the Ethereum network has been reseted (eg. Ganache restarted, computer rebooted…), you must DELETE the ./docker/data folder located in the graph-node folder cloned from the repository).
This is required to clean the existing database that checks the genesis block for the current ethereum network.

Afterwords

The Graph is an amazing tool that greatly reduces the amount of time required to build an API over blockchain data. The team is even working on historical exploration of the data over time.

I would only recommend to any company having the need to build an intermediary API between blockchain data and its application (not especially Dapp) to give it a try, and to participate in its development. Having an open-source solution that could compete with blockchain explorers that would inevitably come from Amazon, Azure and Google would only benefit the whole community.

The Graph is still young, and I would not be surprised if some breaking change appeared in the incoming months, the documentation is also still in its early stage, with a lack of fully-fledge examples. Yet it’s up to us to improve it, and I hope this article will help toward this direction.

--

--