GSoC 2018 Stories 04: Analyzable blockchain structure

Umesh Prabushitha Jayasinghe
3 min readJun 27, 2018

--

Image Courtesy : Behance

Recap

As a part of GSoC 2018, I’m working with Score Lab organization on optimizing the extraction of blockchain and implementing a parser for EtherBeat project. Now we don’t need to use RPC or IPC endpoints to extract the important blockchain information of ethereum. We have developed a ethereum blockchain extractor API to get blockchain information from leveldb files of the ethereum blockchain. C++ implementation of the extractor can be found here. If you’ve any doubts you can read my previous articles where I’ve posted key challenges which I had to face when implementing the extractor.

What’s new!

Next task is to implement a builder which uses the extractor API (which we have already implemented) to transform the blockchain into an analyzable structure. For that we use sqlite and rocksdb. In here we iterate through all blocks one-by-one to take block information and all the transactions (yes, again iterate through block’s transactions). Then each block information is stored in a sqlite block table while transactions are stored in sqlite transaction table. Each transaction is stored under an id starting from 1. Therefore transaction hash -> id mapping should be done. Id corresponding to a transaction hash is stored in Rocksdb. A prefix tx_ has been added before the hash avoid the mixup between transaction hashes and other hashes. Otherwise we’ll have to use several rocksdb databases for each mapping which I don’t think necessary. Similarly block hashes are mapped with block numbers. Transactions related to a block are kept in another table having transaction id and block id. In addition block hash -> id mapping with key prefix block_ has been used to get the block number from blockhash.

Main reason for assigning a numerical unique id starting from 1 is to support fast querying. Sqlite uses binary search to find the index. (since blockchain doesn’t change no intermediate record will be removed. Should check whether if there’s a possibility on making this a direct search with O(1) performance)

Transaction receipts are also stored in transaction receipt table. We don’t need to map transaction receipts with transactions as receipt and transaction is having 1–1 relationship. We keep transaction receipt id identical to the transaction id.

Similarly, each account (external address and contract address) will be assign a unique id value as well. Account addresses are discovered from transactions (from and to values). That means from a single normal transaction we can find 2 accounts. Here we have to be careful because same account address can appear in different transactions, but it shouldn’t give a new id. It must use the existing id related to that address. Once the ids are assigned to the addresses in a transaction, we store sender, receiver, amount and transaction id in a separate sqlite table called fromto table. Finally we will have to index this table on sender id to run queries fast.
Rocksdb mapping of address hash -> address id also performed. Prefix used for the key is address_.

Implementation of the builder which has been explained above can be found here which is again written in C++.

Next step would be to check the performance and optimizing the process.

Click here for the Previous article

--

--