TrueBlocks / Covalent Comparison
This article highlights key advantages of using TrueBlocks, a local-first solution to blockchain data extraction, by comparing our software to the popular Covalent API. The advantages are:
We conducted our analysis by running TrueBlocks locally on a “beefy” Mac laptop (Apple M1 Max, 64GB memory, Monterey version 12.2.1). On that same machine, we ran the Erigon Ethereum mainnet client as an archive node.
We queried the Covalent APIs described in this documentation from that same machine. Note that Covalent is a “shared resource” and therefore necessarily rate-limited.
We used the TrueBlocks APIs to extract transactional histories for 5,000 randomly-selected Ethereum addresses. We used the corresponding APIs from Covalent to pull the same histories. We then compared the results.
The entire process, along with links to the shell scripts we used and information about how to access the data we produced, is detailed here.
We discuss each of the four advantages below and then conclude with a discussion of potential sources of error.
Speed: TrueBlocks is faster
We ran two sets of data extraction. One against TrueBlocks, the other against Covalent. In the following table, we present the amount of time taken to complete these two presumably identical tasks.
Note: When extracting from Covalent, we were forced to slow down the processing, otherwise our requests timed out. After experimenting, we decided to add a one-second delay to each Covalent request in order to avoid this time out. This added an additional 83.3 minutes to the processing. While there are much better ways of backing off of an API, we felt that the additional programming effort required was not warranted, especially given that we did not have to do the same thing for TrueBlocks.
Upshot: TrueBlocks is nearly twice as fast as Covalent.
Accuracy: TrueBlocks is more complete
After doing the data extraction from each source, we compared the results. The results surprised even us.
We queried 5,000 randomly selected addresses. In no case did Covalent return more results than TrueBlocks. For 3,174 of the addresses, TrueBlocks returned more results than Covalent.
A summary of results is presented here:
Covalent returned data TrueBlocks returned data
nAddrs that TrueBlocks did not that Covalent did not
5,000 - 3,174
That same information viewed by the total number of transactions returned is here:
nTxs (Covalent) nTxs (TrueBlocks) Diff Material
1,336,508 1,534,997 198,489 45,328
Note: In our parlance, a “material” transaction is one in which the Ether balance of the addresses changed as a result of the transaction. TrueBlocks has the ability to expand this definition further to include any ERC20 token balance, however, we chose not to do that. If we had, the above results would have been skewed even more in our favor.
Upshot: TrueBlocks finds more data than other methods.
Why is TrueBlocks more accurate?
How is this even possible? It turns out to be rather simple. TrueBlocks digs more deeply into the data. This ability to “dig more deeply” is not un-related (by that I mean it is very related) to the fact that, being fully local, TrueBlocks is not rate limited. We discuss this exact issue in the first version of our Specification for the Unchained Index.
We studied the “material” transactions (the 45,328). We looked at the function calls that constituted those transactions — the first four bytes of their input data.
We found 436 different four-byte patterns, of which 253 were “known” to us (that is, we were able to download the ABI file from EtherScan). The function calls can be found in the
reason folder of the data store (see how to get the data in the description of our data pipeline).
We summarize the “known” functions in this word cloud:
I discuss the most obvious function (donate) below, but look at the words:
addLiquidity. These words scream, “transfer of value.” Covalent does not return them. No wonder per-block accounting on the mainnet doesn’t work for shit?
Upshot: One cannot even hope to do perfect accounting on an 18-decimal-place accurate ledger if one is missing transactions.
Privacy: TrueBlocks is local-first
All of the processing done by TrueBlocks, including accessing the Erigon node, is run locally (on a laptop!). We are happily hidden behind a firewall. We call this mode of operation, “running behind the node.” We consider the node a shield, not only from incomplete or inaccurate data, but also as protection against censorship and prying eyes. TrueBlocks asks no third party for any data at all (other than ABIs :-). None of this is true of Covalent.
Upshot: TrueBlocks enables permissionless access to better data faster and is perfectly private.
Flexibility: TrueBlocks is a platform
TrueBlocks is an open-source software package consisting of many components. There is a collection of command line tools, an API server, a docker package, an address monitoring system, an indexer, and an increasingly robust collection of well-documented GoLang packages and SDKs. You can program with it, as we’ve done to complete this study.
Upshot: TrueBlocks is a platform of tools and libraries, for individual users and developers that works locally, accurately, privately, and flexibly.
Sources of Error
To complete the article, we present a few possible sources of error in our analysis. We welcome any and all comments intended to help us improve our work.
Predominance of GitCoin-related addresses in the dataset
Many of the addresses we studied have interacted with the GitCoin smart contracts. For this reason, many of the “missing” function calls were
donate (about 71%). While it is accurate to say that TrueBlocks returned these transactions and Covalent did not, it is also accurate to say that Covalent could easily add this “special case” to their processing. The important point I want to make, however, is that TrueBlocks doesn’t have “special cases.” TrueBlocks purposefully processes the data unaware of its meaning. This is the main reason we find transaction while other methods do not. We call this aspect of our work, “not suffering from the long-tail problem.”
Upshot: Our data skews towards a certain type of address.
Misuse of Covalent APIs
We did our best to study the Covalent APIs and use them properly, but we may have missed something. Please let us know if we did. Perhaps Covalent has for-pay API endpoints that deliver better data. But, if that’s the case, ask yourself if that is in keeping with the web3 ethos.
Upshot: We may have misused the public Covalent APIs.
Focus on smaller addresses
We purposefully limited our study to addresses with less than 6,000 transactions. While TrueBlocks can easily handle addresses with many more transactions, Covalent imposes rate limits. We felt that if we queried for addresses with more than 6,000 transactions we would encounter two problems: (1) Covalent would become too slow to be practicable, (2) Covalent would ban us from their site.
Upshot: Rate limiting sucks. If you’re running your own node (and TrueBlocks), you will not be rate limited.
Block range limit
We purposefully excluded transactions prior to block 3,000,000 from our analysis. We found, surprisingly, that Covalent had done so for many of the transactions from the October 2016 dDos attack on Ethereum. While it is justifiable for them to have done so, as many of those transactions were not “material,” it reminds us that centralized APIs can become the arbiters, without our permission (or knowledge), of what data they show their users— this is not true with TrueBlocks. If we had included those records, the results of this study would have been significantly more skewed in our favor.
Upshot: Some older block were ignored in this analysis.
Some people say “blockchain data is not rocket science.” Others say it is. We say it should at least be science — the ledger is, after all, 18-decimal-place accurate and immutable. Everything should be re-producible. Here’s how to do that:
Using this repo
- Clone this repo and change into the
- Visit the Covalent website and get an API key. Put the key (alone) in a file called
./initto set up the folders and build a simple post-processing tool (requires
golanguage version 1.18 or later).
- Collect together your own list of addresses (or use ours — see the file called
addresses.txtin the folder). Create a shell script called
./download(or use ours). Make the shell script repeatedly call into
./download.1(as ours does) to process each address.
- Run the
- The results of that command will will be placed in appropriate-named subfolders in a folder called
Obtaining the data
We tried very hard to make the data reproducible. All you need is a Covalent API key, a locally-running version of the TrueBlocks, a locally-running version of Erigon, and a few days or weeks to wait for the data to download. Alternatively, you may download our data directly from IPFS using the following commands:
Join the paradigm shift. Run your own node. Index your own data. Speed up your life! We’ve shown that local-first software can be faster than an API even on smaller hardware. We’ve shown that local-first software can be private. But, most importantly, we’ve shown that one can “dig deeper.” This is because of speed. What are you waiting for? Join us in our mission.
Support Our Work
First, to our biggest supporters, Meriam Zandi and Dawid Szlachta, thanks for your help with this article.
TrueBlocks is self-funded from our own personal funds and grants from our supporters including The Ethereum Foundation (2018), Consensys (2019), Moloch DAO (2021), Filecoin/IPFS (2021), and, of course, our GitCoin donors.
If you like this article and wish to support our work please donate to our GitCoin grant https://gitcoin.co/grants/184/trueblocks. Even small amounts have a big impact.
If you’d rather, feel free to send ETH or any other token to us directly at trueblocks.eth or
New to trading? Try crypto trading bots or copy trading