Ethereum Datafarm: Parsing Historic Event Data from the Ethereum Blockchain into CSV files using the Etherscan API

Toni Wahrstätter
2 min readAug 29, 2022

TLDR. Ethereum Datafarm can parse Events from Smart Contracts and produces handy CSV files that can then be analyzed using SQL, Python, R, Excel, etc. It does not require to have access to an archive node, since it uses the Etherscan API.

As a student without a background in Computer Science, it can be painful to collect the data required to perform quantitative analyses:
Economics students may have the theoretical knowledge to analyze price data, however, also often cannot parse blockchain data or set up an archive node.

Ethereum Datafarm is a project I originally started to collect data for my Master Thesis on Ethereum-based Stablecoins. I needed all Transfer Events for every Stablecoin contract on Ethereum and consequently started developing an application that can exactly do that job without requiring to have access to an archive node. The tool can parse every Event from every (published) Smart Contract and produces handy CSV files that can then be analyzed using SQL, Python, R, Excel, etc.
Btw, find the thesis here (it’s crazy to see how much has changed since 2017).

Ethereum Datafarm uses the Etherscan.io API to retrieve the Event data from specified contracts. Therefore having a (free) Etherscan API key is a prerequisite.

How to use it:

Clone Github Repository:

git clone https://github.com/Nerolation/ethereum-datafarm

Move into folder:

cd ethereum-datafarm

Create virtual environment:

python3 -m venv .

Activate virtual environment:

source bin/activate

Install requirements:

pip install -r requirements.txt

Install requirements:

pip install -r requirements.txt

Edit contracts.csv file

This file is used to configure the contracts and events to be parsed. Make sure to use the appropriate format. Sample input can be the following:

0xDe30da39c46104798bB5aA3fe8B9e0e1F348163F,gitcoin,Transfer(address,address,uint256),12422079,50000

This indicates to parse the Gitcoin contract at address 0xDe30… since block 12422079 with a chunksize of 50000 blocks per API request (this will be adapted dynamically during runtime). Th event to be parsed is the Transfer event, represented by its canonical expression Transfer(address,address,uint256).

Start parsing

python3 src/run.py 

Optionally, use the flag -cto set the number of cores to be used (this is required if your machine has too many cores such that the limit of the API is reached. Use -c 3and set the number of cores to 3 in order to avoid such cases.

Optionally, use the flag -locto set the storage location for the csv files. The default location is ./data.

Feedback appreciated, have fun!

Anton Wahrstätter

Photo by GuerrillaBuzz Crypto PR on Unsplash

--

--