Indexing/dashboarding billions of bitcoin transactions with Elasticsearch and Kibana

Fabien Stepho
Jan 29, 2018 · 4 min read

Bitcoin transactions have become very expensive since the end of the last year 2017, with fees rising to $55 as demand has increased considerably. So, I wanted to analyze what was happening in the bitcoin network. This story describes how I proceed to analyze billions of bitcoin transactions.

The tools and frameworks I wanted to use for this, are all open source and include :

Bcoin — Javascript bitcoin library for node.js Elasticsearch — RESTful, distributed search & analytics Kibana — Tool to explore, visualize, and discover data Docker — Open platform for developers and sysadmins to build, ship, and run distributed applications Microsoft Azure — Cloud platform

Here are the following steps I came accross to accomplish my goal :

Step 1 — Installing the cloud infrastructure Step 2 — Running and syncing a full bitcoin node with the complete blockchain data Step 3— Extracting bitcoin data from the local blockchain Step 4 — Indexing the data into Elasticsearch Step 5 — Visualizing and analysing data

Step 1 — Installing and defining the cloud infrastructure

The first step was to install the platform. A virtual machine with at least 500GB of disk space is required for the need of blockchain data and elasticsearch index.

Here is an overview of the Azure cloud portal interface :

Docker is a must have for packaging and running isolated containers. My docker compose definition consists mainly of the bitcoin node container, the elasticsearch container and the kibana container :

Docker compose definition file

As you can see, the data are not hosted inside the containers, but outside on the vm host, via docker “volumes”feature. So the generated data are not lost when containers are restarted or deleted.

Docker images and containers are constructed via the commande : docker-compose up. You can see running containers on the host vm via the command : docker ps

Running docker containers

Step 2 — Running and syncing a full bitcoin node with the complete blockchain data

I choosed to run the bcoin open source library. The project is hosted on github : https://github.com/bcoin-org/bcoin. I tried first with Bitcore library, but I encountered performance issues while syncing. The syncing of the blockchain data was a bit long and lasted several days, so I decided to switch with Bcoin library on wich I can develop code written in node js (step 3 and step 4).

Bitcoin node synchronizing data

Step 3 — Extracting bitcoin data from the local blockchain

Thanks to the bcoin API, I writed some pieces of codes to retrieve bitcoin transaction data from the full running node. The API function I used is client.getBlock(height). Once I extracted the data, I pushed it to Elasticsearch index.

Node js Bcoin code

Step 4 — Indexing the data into Elasticsearch

I created three indexes to handle the data. One for output transactions, one for input transactions and one for the main transactions. Each index contains fields like address, block height, block hash, fee, amount, coinbase, etc.

Elasticsearch indexes

The code which push the data into Elasticsearch use the javascript elasticsearch API with his bulk indexing feature. The bulk API allows me to make multiple index requests in a single step. This is particularly useful since I need to index a lot of transactions, which can be queued up and indexed in batches of thousands (100 000 in my case).

Here is the code for indexing :

Elasticsearch bulk indexation

Once runned, I waited several hours before I got 500 000 blocks totally indexed :

Bulk indexation processing result

Step 5 — Visualizing and analysing data

Once the data was indexed, I plugged Kibana of the elastic stack, on the indexes :

Discovering transaction data with Kibana

I defined several visualizations and created a simple dashboard with metrics like “Total transaction fees”, “Number of unspent transaction outputs”, “miners repartition”, “Most popular addresses”

Kibana dashboard

Conclusion

With all these five steps done, I could now make things like :

  • Transaction monitoring,
  • Complex querying
  • Exploring anomalies with machine learning,
  • Analyzing transaction relationships with Graph,

My next story will be on these use cases, stay tuned !

Note : The source code is freely available on my github repository : https://github.com/fstepho/bcoin-es


Fabien Stepho

Written by

Software Architect @ Object’ive — Blockchain enthusiast — http://www.object-ive.com — https://twitter.com/fstepho