Evgeny Medvedev
Nov 13, 2018 · 4 min read

Recently we announced beta availability of transaction traces (aka internal transactions) in our public Ethereum dataset in BigQuery: https://bigquery.cloud.google.com/table/bigquery-public-data:crypto_ethereum.traces. They allow querying all Ethereum addresses with their balances.

Below is the query that retrieves the top 10 addresses:

Run it in BigQuery

Here is the result, on the day of writing (as csv file):

Here is the high-level overview of the workflow we used to get this result:

  1. Export all traces from Parity node to BigQuery using Ethereum ETL.
  2. Enrich the traces with the status field which accounts for parent trace failures.
  3. Add genesis and daofork traces.
  4. Write an SQL to calculate the balances.

Along the way I had to solve a few interesting challenges, which I will uncover in the following sections.

Calculating Trace Statuses

As explained in the Parity’s Trace Module Wiki:

The traceAddress field of all returned traces, gives the exact location in the call trace [index in root, index in first CALL, index in second CALL, …].

i.e. if the trace is:


then it should look something like:

[ {A: []}, {B: [0]}, {G: [0, 0]}, {C: [1]}, {G: [1, 0]} ]

Parity also provides an error string for each trace in case it failed. Unfortunately it doesn’t account for the top-level trace failures. For example, if the call to C in the example above succeeded, but the top-level call to A failed due to say out-of-gas error, then the trace for the call to C will not return any error with it. This is problematic because any state changes in the call, including Ether transfers, are reverted. We want to filter out such nested calls that look like successes in the traces but actually were reverted due to top-level trace failures.

Your first thought may be to reconstruct the call tree from the flat traces returned by Parity, for each call traverse up the tree and see if any top-level calls failed, set the status for each call accordingly.

Another solution, which I ended up using, is to first query all failed traces, then find all their child traces and mark them as failures. This is easily done in BigQuery with the following SQL: https://github.com/blockchain-etl/ethereum-etl-airflow/blob/master/dags/resources/stages/enrich/sqls/traces.sql

The status field is 0 when the trace succeeded, and 1 when it failed due to any operation that can cause the call itself or any top-level call to revert. This is consistent with the status field in transaction receipts.

Handling Irregular State Changes in DAO Fork

If you simply query balances from the traces returned by Parity, these two addresses will come up at the top:

In reality, the above addresses have only 7 Wei each, on the day of writing. If you google them you will encounter this post about the DAO fork by Vitalik Buterin https://blog.ethereum.org/2016/07/20/hard-fork-completed/. Here is the summary:

Block 1920000 contained the execution of an irregular state change which transferred ~12 million ETH from the “Dark DAO” and “Whitehat DAO” contracts into the WithdrawDAO recovery contract.

The full specification of the changes is in this EIP https://github.com/ethereum/EIPs/blob/master/EIPS/eip-779.md.

These irregular state changes don’t appear in Parity traces, so I had to query them with this SQL: https://github.com/blockchain-etl/ethereum-etl-airflow/blob/master/sqls/daofork_traces.sql. The result is then hardcoded in the Ethereum ETL tool and ingested to the BigQuery dataset.

trace_type column for these irregular traces is daofork. Another special type of traces that I had to add to the table is genesis, which are the initial Ether allocations. They are not associated with any transactions and thus not returned in transactions or traces APIs.


Johannes Pfeffer from alethio helped identify addresses for which the balances shown in the query results doesn’t match with the balances reported by the Ethereum nodes. Here is the list of the addresses:


The presumable root cause is this issue in Parity https://github.com/paritytech/parity-ethereum/issues/7765, which results in that the calls to precompiled contracts are missing from API results.

We will need to resync our Parity nodes, rerun the ETL, after which this issue will be resolved.


It’s possible to query running Ether balances for every address on every date. Try to compose a query that will return the number of addresses with non-zero balance, and plot it over time. Post your SQL in the comments. (Solution)

Also read:

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Evgeny Medvedev

Written by

Creator of https://github.com/blockchain-etl, Co-founder of https://d5.ai, Google Cloud GDE, AWS Certified Solutions Architect, Data Science

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade