Exporting and Analyzing Ethereum Blockchain

Evgeny Medvedev
May 10, 2018 · 7 min read

There are many cases when you want to export Ethereum data to the csv format and run some queries, perform analysis, draw diagrams/charts/graphs etc. Some of the questions you might be interested in:

  1. When did the first in history ERC20 transfer happen?
  2. How many ERC20 compatible contracts are there on the blockchain?
  3. What are the most popular tokens?
  4. What fraction of ERC20 transfers are sent to new addresses, i.e. addresses that had 0 balance before?
  5. In what order are transactions included in a block in relation to their gas price?
  6. What fraction of all transactions are calls to contracts?
  7. What was the highest transaction throughput?
  8. What is the total Ether volume?
  9. Token balance for any address on any date?
  10. What is the total gas used in all transactions?

In this article I will walk you through the process of exporting Ethereum blocks, transactions and ERC20 transfers using Ethereum ETL, running SQL queries with AWS Athena and visualizing the data in AWS QuickSight.

Read until the end and you will know the answers to the questions above.

Exporting Data to CSV

I exported the first 5 million blocks and the whole process took me:

Below is the step-by-step guide:

Wait until geth downloads the blocks that you need. You can check it by running:

Normally geth will download the latest state after the blocks are loaded. You don’t need to wait until the full sync though. As soon as geth downloads the blocks you can start exporting the data.

  • Clone Ethereum ETL
  • Install dependencies
  • Run Ethereum ETL (you may want to comment out the parts that you don’t need in export_all.sh before running it):

Wait until it finishes by inspecting nohup.out. The output will be partitioned in Hive style, every partition containing 100k blocks. Totally 146 files (no ERC20 transfers in first 400k blocks).

Blocks — 4.8 GB:

Transactions — 51.5 GB:

ERC20 transfers — 7.9 GB:

You can generate a new access and secret keys in IAM console https://console.aws.amazon.com/iam/home

  • Upload the files to S3:

Here is the price breakdown for Singapore:

You don’t pay data transfer costs into EC2 when synchronizing with the Ethereum network. Data transfers between EC2 and S3 are also free within the same region.

Before you proceed you might want to convert the CSV files into a columnar format: Converting Ethereum ETL files to Parquet. This step is not necessary but it will significantly reduce the running time and the cost of your SQLs.

Running SQLs in AWS Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. You can also connect to it with the JDBC driver.

  • Run sanity check queries:
Image for post
Image for post

Image for post
Image for post

Image for post
Image for post

Image for post
Image for post

To narrow down the query to partitions that cover a certain block range ($query_start_block, $query_end_block) filter by end_block and start_block — this will reduce the cost and run time:

Image for post
Image for post
  • Token balance for any address on any block height:
  • The first ERC20 transfer:
  • How many ERC20 compatible contracts are there on the blockchain?
  • What fraction of ERC20 transfers are sent to new addresses, i.e. addresses that had 0 balance before the transfer? (48%)
  • What was the highest transaction throughput, assuming 15 seconds block time? (25 transactions per second)
  • Maximum number of ERC20 transfers per transaction (1000):
  • Total Ether volume (5.8 billion Ether ~= $2.9 trillion for $500/Ether):
  • Total value of gas used (746 thousand Ether ~= $373 million ) (this should use receipts.gas_used instead of transactions.gas):

AWS Athena charges you 5$ per TB of data scanned. You can save on costs and get better performance if compress data, or convert it to columnar formats such as Apache Parquet. Read my article on how to convert the CSVs to Parquet: Converting Ethereum ETL files to Parquet.

Visualizing the Data

Amazon QuickSight is a cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and get business insights from your data. It allows querying data from Amazon Redshift, Amazon RDS, Amazon Athena, Amazon S3, and Amazon EMR (Presto and Apache Spark); connect to databases like SQL Server, MySQL, and PostgreSQL, in the cloud or on-premises.

  • Sign in to QuickSight console https://us-east-1.quicksight.aws.amazon.com/sn/start
  • Go to Manage QuickSight on the top right > Account Settings > Edit AWS Permissions. Check the box for Amazon Athena, Amazon S3 and select all S3 buckets
  • Return to the QuickSight dashboard and click on Manage Data
  • Click on New Data Set and select Athena
  • Enter the Data source name and click on Create Data Source
  • Select ethereumetl database, then blocks table
  • TODO

Here are some visualizations I created:

  • Transaction Count
Image for post
Image for post
  • ERC20 Transfer Count
Image for post
Image for post

The peak on November 27 with over 1 million transfers is when INS Promo token sale happened https://blog.ins.world/insp-ins-promo-token-mixup-clarified-d67ef20876a3

  • ERC20 Tokens with Greatest Number of Transfers
Image for post
Image for post

AWS QuickSight charges you $12 per user-month, the first user is free https://aws.amazon.com/quicksight/.

Also read:

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store