Query Ethereum with SQL

Anyblock Analytics GmbH
Coinmonks
Published in
8 min readJan 17, 2019

--

TL;DR — You can now easily query the Ethereum blockchain using our eth.events SQL database. Down to the smart contract event level, already enriched and decoded into human-readable text, and updated in real-time with the latest blocks from the Ethereum Mainnet and various other networks. You can very quickly register here to get free unlimited access instantly.

In this article you can learn about

  • How to play around with the visualization of our SQL demo use case of Ethereum blockchain data
  • A step-by-step walkthrough to try the SQL Ethereum search engine yourself
  • The blockchain SQL data model we use to store the enriched Ethereum data
  • Why we choose to offer a blockchain SQL database besides the existing ElasticSearch index
  • How we added the Ethereum SQL database to our tech stack
  • The status overview of our real-time, reorg-safe Ethereum SQL index for all Ethereum chains

Ethereum SQL data use case: DAI Unique Recipients per Day

A picture is worth a thousand words, so have a look at our example use case that is created using our Ethereum SQL index:

Ethereum SQL data use case: DAI Unique Recipients per Day https://insights.eth.events/public/dashboard/52bf213a-59cf-4e63-a57e-d0d63175f310

The chart shows the unique recipients (addresses) per day of the @MakerDAO stablecoin “DAI”. Right after launch on Dec 18, 2017 it shows the initial excitement. And even with a little stagnation during summer, ever since raising the debt ceiling in the beginning of July and the inaugural governance vote in September 2018, the DAI user base has been growing strongly. The dotted trend line might rise even faster in the future as more and more stablecoin use cases become apparent and when multi-collateral DAI finally launches.

This simple chart illustrates a category of use cases where utilizing a powerful SQL database is very appropriate to store enriched Ethereum blockchain data. Querying an Ethereum client using the JSON-RPC API would take a very long time as you would have to traverse the whole chain with multiple requests. And even then you would have to manually decode the smart contract method names into a human-readable format. And lastly, you probably need to visualize the data in order to uncover the trends and discuss it with your peers.

If you click on this link, you can play around with some filters to get a feeling for actually working with block chain data to gain insights.
For this demonstration, we have used the open source tool Metabase as our visualization layer. Check out the extensive features in the comprehensive Metabase documentation.

If you want to use Metabase to visualize your own blockchain queries, feel free to contact us and we can set you up.

Try the Ethereum SQL search engine yourself

To access the raw SQL interface you first need to retrieve your access credentials. You can register for a free account and get the SQL access credentials right now at https://account.eth.events/ in just a few seconds (or see our detailed step-by-step guide).
Then you need to decide on a proper PostgreSQL client to access our Ethereum database. We suggest the free PGAdmin4, but any other client supporting PostgreSQL 11 will do.

SQL client pgAdmin4 accessing Ethereum blockchain data

Setup your SQL connection as mentioned in the account backend. If you need help or any errors occur, please don’t hesitate to ask us for assistance.
You can find our SQL documentation here, including a database schema description with the entity-relationship diagram as well as a brief SQL tutorial.

However, please keep in mind that the Ethereum SQL interface is still in beta, hence the service might not be constantly available and the database schema could change from one day to another.

We might also be able to help you to pin down and visualize the data relevant for your specific use case. Just email us at contact@eth.events — we are excited to see our data used in many new ways!

Data model for the Ethereum SQL index

This entity relationship diagram shows the schema of the Ethereum SQL database:

Data model for the Ethereum SQL index | https://docs.eth.events/en/latest/sql/schema/index.html

SQL tables and data sources

Blocks (Block), transactions (TX) and logs (Log) are read from the Ethereum blockchain. Trace is client specific — here we only save the data from our Parity node.

The Call and Event table contain some of our enriched data, e.g. the decoded name of the contract method (in method and event columns respectively). To enrich the data, we use the tables Contract and Token to match the hashes etc. As you can see, we are only able to make the data human-readable if we have the contract ABI.
If you are interested in monitoring your own (non-public) contract — contact us and we can integrate your ABI very easily so that you can interpret the data for your business case more easily.

Most data in Contract, Token and Link is pulled from Etherscan at the moment. Particularly regarding contract ABIs we have been in talks about establishing an independent registry.
If you want to help or can provide funding to support this mission to help the whole Ethereum ecosystem — please contact us.

The arguments are encoded as a JSON array in their enclosing tables of either the Contract, the Call or the Event . In case of referencing a contract that would mean the constructor arguments at the time of creation.
You can find a more detailed diagram and description of the Ethereum blockchain SQL data model in our SQL documentation.

Why we added SQL access besides Elasticsearch

Our Elasticsearch index of the Ethereum blockchain offers lightning fast access to the data, even for complex queries down to the smart contract event level. So why add an SQL database (Postgres in our case)?

The main reason is that a lot more people are familiar with SQL as a query language than with the Elasticsearch Query DSL. SQL has been around a lot longer and the SQL standard has been followed very closely by many different commercial and open-source SQL database products. Also quite a number of people that are not developers themselves can use or at least read SQL queries. And while many business domain users will require charts and diagrams when discussing data, most of them are familiar with the table format of rows and columns and might even know how to combine and aggregate things from using Excel spreadsheets.
The visualization layer is actually another reason why we offer the Ethereum data via SQL. While Elasticsearch can be included in many business intelligence tools like Grafana, Kibana or Plotly, there is a whole other set of programs or libraries available like Tableau, QlikView or Jetbrains DataGrip that will interface more easily with SQL data.

We will publish a more comprehensive comparison between SQL and Elasticsearch blockchain data index in the future. But for now you can rest assured that no matter if you are using eth.events via the Elasticsearch API or our SQL Ethereum index, you will get access to the same data — enriched and decoded with smart contract and token information for your convenience. The index is updated in real-time with the latest blocks and both SQL and Elasticsearch enable very fast query response times. In that sense, it presents a solution to the challenge described previously here: “Why reading data from the Ethereum blockchain is hard and how to speed it up”

How we added the Ethereum SQL database into our tech stack

Our technical architecture made it relatively easy to add SQL via Postgres, so that you can now search the data of the Ethereum blockchain also in the most common query language.
It all starts with our Ethereum clients participating in the Mainnet. We use both Geth and Parity as archive nodes to get all the raw blockchain data. Having both Geth and Parity clients active adds redundancy and resilience to our infrastructure and is one piece that enables us to offer enterprise-grade service level agreements (SLA). Some private consortium blockchains also do not support every client, e.g. the Tobalaba chain of the Energy Web Foundation (EWF) relies on the Parity client only at this point.
We drain the data and put it into our Apache Kafka streaming data pipeline / message queue. The Ethereum data is then transformed and enriched in our Apache Storm cluster topology and sent to our database layer.
While we previously used only Elasticsearch, we now simply added another topology to feed the Ethereum data into the PostgreSQL database.

To be able to load the ~2 TB of Ethereum blockchain data into the server, we had to wiggle quite a lot with the configuration and also tweak the Java database abstraction layer (JDBC). As you might know, any SQL relational database will require a more thorough structure or schema than doc-based Elasticsearch. So the stream of Ethereum data being quite big and unordered needed to be channeled wisely. Another characteristic of SQL databases is the reliance on constraints and indexes which had to be dealt with for the historic import of the Ethereum data in particular.

Ethereum SQL real-time index for all Ethereum chains

More importantly, however, our SQL indexes are updated in real-time as new blocks are propagated through the network! This is made possible due to the fact that there are no caches, batch processes or anything to delay the data being enriched and indexed. This has already been possible for our Elasticsearch API, but it also applies to the Ethereum SQL index. This also makes the data reorg-safe, as updating the index with the newly verified blocks as well as removing the old entries is just a matter of sub-second operations after being notified by the node.

Real-time status of eth.events Ethereum SQL indexed networks | https://account.eth.events/status

These great features are not only available for Ethereum Mainnet, but also for all the Ethereum test nets (Ropsten, Kovan, Rinkeby). And of course we can quickly index Ethereum-based private blockchains on request as well, like we did for the EWF Tobalaba chain. Currently we have these available via our Elasticsearch API, but within a few days also in SQL. We are also in progress to develop an eth.events virtual appliance so an instance of eth.events can be operated within a private network or VPN.
You can see the full list with the index status here: https://account.eth.events/status

Please let us know if we are missing a chain you are working with!

Happy BUIDLing and stay awesome!

Get Best Software Deals Directly In Your Inbox

Originally published at eth.events on February 20, 2019.

--

--

Anyblock Analytics GmbH
Coinmonks

Anyblock Analytics is a German blockchain solution provider. We offer consulting, tools and data to integrate business processes with blockchain.