Towards production grade open-source Ethereum clients

At TokenAnalyst we provide core on-chain infrastructure that is used by Ðapp’s, researchers, and investors. From past experience running Ethereum clients, we are big fans of the Parity client, and recently ran nearly 100 parity nodes to download data from the whole blockchain.

Running 95 parity archive nodes in parallel

Missing internal transactions

Note: This is not a consensus issue but a usability issue

We rely on the traces functionality of Parity nodes to analyze contract calls and internal transactions. In the process of integrating internal transactions into our historical data pipeline we encountered an interesting data discrepancy between two Parity versions. In the newer versions of Parity we identified more than 1 million missing internal transactions. Follow the link below for a detailed overview.


State of the clients today

Transactions in a blockchain are immutable, and as a consequence their internal transactions should be immutable and consistent too. As of today, 40% of all parity nodes in the network are versions with incomplete internal transaction data.

40% nodes with wrong internal transactions. Source: https://ethstats.net/

This particular discrepancy is just one of the things we have to deal with when normalizing the data we get from different Ethereum clients. Moreover, there are several inter-client quirks which we have already written about earlier. As a community we need to improve on that, so we can rely on the data coming from open-source Ethereum clients for our production grade systems.


Things to ponder

Consensus on the protocol level is required to integrate a client into the network since it is enforced by the consensus rules. However, out-facing data schemas and data types should also be standardized and transparent across all the different Ethereum clients. We are thankful to the teams behind go-ethereum and Parity and encourage a debate on the exposed data and schema. We would also prefer a stable, backward compatible, version of open-source nodes with which we can build our production architecture.


TokenAnalyst parses and classifies every on-chain transaction with the goal of deriving fundamental insights to understand crypto-assets.