Ethereum Datafarm: Parsing Historic Event Data from the Ethereum Blockchain into CSV files using the Etherscan API
TLDR. Ethereum Datafarm can parse Events from Smart Contracts and produces handy CSV files that can then be analyzed using SQL, Python, R, Excel, etc. It does not require to have access to an archive node, since it uses the Etherscan API.
As a student without a background in Computer Science, it can be painful to collect the data required to perform quantitative analyses:
Economics students may have the theoretical knowledge to analyze price data, however, also often cannot parse blockchain data or set up an archive node.
Ethereum Datafarm is a project I originally started to collect data for my Master Thesis on Ethereum-based Stablecoins. I needed all Transfer Events for every Stablecoin contract on Ethereum and consequently started developing an application that can exactly do that job without requiring to have access to an archive node. The tool can parse every Event from every (published) Smart Contract and produces handy CSV files that can then be analyzed using SQL, Python, R, Excel, etc.
Btw, find the thesis here (it’s crazy to see how much has changed since 2017).
Ethereum Datafarm uses the Etherscan.io API to retrieve the Event data from specified contracts. Therefore having a (free) Etherscan API key is a prerequisite.
How to use it:
Clone Github Repository:
git clone https://github.com/Nerolation/ethereum-datafarm
Move into folder:
cd ethereum-datafarm
Create virtual environment:
python3 -m venv .
Activate virtual environment:
source bin/activate
Install requirements:
pip install -r requirements.txt
Install requirements:
pip install -r requirements.txt
Edit contracts.csv file
This file is used to configure the contracts and events to be parsed. Make sure to use the appropriate format. Sample input can be the following:
0xDe30da39c46104798bB5aA3fe8B9e0e1F348163F,gitcoin,Transfer(address,address,uint256),12422079,50000
This indicates to parse the Gitcoin contract at address 0xDe30… since block 12422079 with a chunksize of 50000 blocks per API request (this will be adapted dynamically during runtime). Th event to be parsed is the Transfer event, represented by its canonical expression Transfer(address,address,uint256).
Start parsing
python3 src/run.py
Optionally, use the flag -c
to set the number of cores to be used (this is required if your machine has too many cores such that the limit of the API is reached. Use -c 3
and set the number of cores to 3 in order to avoid such cases.
Optionally, use the flag -loc
to set the storage location for the csv files. The default location is ./data.
Feedback appreciated, have fun!
Anton Wahrstätter