Blockchain Migrator — From scratch to explorer.

Praful Parashar
Blockwala
Published in
8 min readAug 4, 2018

I came to know about the Bitcoin when it’s price skyrocketed in 2017. The crypto-currencies have always been a topic of speculation. It is decentralising the financial system and transferring the control from governments and banks to people themselves.

This article will guide you from understanding the bitcoin Node to extracting information from it. The explorer is based on Django with database in Postgre Sql. So, here we go.

Bitcoin Node — What is it ?

Bitcoin is a decentralised network (P2P network protocol), similar to torrents but in regard to currency with no central authority. Any computer that is connected to the Bitcoin network is a Bitcoin Node. To work upon bitcoin you need to set up the node on your computer, you may or may not download the full node to work upon. The steps in this link will help you setup the bitcoin-core in your system.

https://www.linuxbabe.com/ubuntu/install-bitcoin-core-wallet-ubuntu-16-04-16-10

So, now you are done installing the bitcoind or bitcoin-qt according to your preference. Run it and it will download the .blk and .lbd files to your root folder in blocks directory.
Well wait, what are .blk and .lbd files ?

BLK files are used for controlling the data that is published to a shareable file. In reference to bitcoin, Every block that your node receives gets appended to a blk.dat file. Also, instead of the entire block chain being stored in one massive file, they are split in to multiple blk*.dat files.

An LDB file is a lock information file that prevents any Access database from being changed by more than one user at a time. In reference to bitcoin they lock the folder to establish an uninterrupted connection. To use the data, you should download the full node or stop the node syncing process.

These files have different encoding to store a large amount of information, so special parser have to be used to convert the hex into usable format. The parser will pick up the blk files in ordered fashion to calculate the height of the blocks as well.

Block Structure in Bitcoin.

The above picture depicts the block structure of block chain. Each block is connected to each other by a previous block hash(connecting it in inverse order). A block hash includes transactions which in turn are performed between addresses.

Currently the maximum size of a single bitcoin block is 1mb; which is only big enough to support approximately 7 transactions per second, which quite frankly is rather disappointing. At a future time the block size may be increased but, if that were to occur, the already massive block chain would blow up out of control. This is a major issue for the bitcoin protocol. It cannot in any way support a massive number of transactions which could accommodate global commerce traffic, as of yet.

Migrating into database — From .blk to postgres through django.

Bitcoin Parser — From Node to usable data

A Block Parser reads the Bitcoin’s blocks. The blockchain is a decentralised database. Bitcoin is a pseudonymous system. Meaning, ECDSA key pairs are used to abstract the identity of users. There is encrypted and public data stored on the blockchain. Public data can be accessed by nodes to view the state of transactions. Encrypted data is used to create such transactions. All of the binary data in the blockchain can be read.

The Bitcoin protocol dictates its structure and is the means through which each node maintains a duplicate copy. Overall, the block chain is just a data structure for storing blocks. The blockchain stores blocks in a series, beginning with the genesis block.

Parsing the blk files requires ‘parsing libraries’ such as: binascii, double_sha_256, hashlib.

  • Block Details
  1. Block Header — The first 4 bytes of a block in the block chain file is a 32 bit header referred to as a ‘magic id’. This is simply an identifier to let us know that we are at the beginning of a block.
  2. Block Length — The next 4 bytes of the block header contain the length of this block.
  3. Previous Block Hash — This entry contains a 32 byte hash of the ‘previous’ block. Think of this like a linked list. Each block in the block chain points to a previous block, except the genesis block.
  4. Merkle Root — This 32 byte hash is not needed to parse the bitcoin transactions. It is required for rapid access of bitcoin data history by bitcoin core.
  5. Difficulty — A measure of how difficult it is to find a hash below a given target.
  6. Nonce — This value is a random number value used as part of the mining process.
  • Transaction Details
  1. Version — It’s start with the transaction’s version number(expected to be 0 or 1).
  2. Transaction Number — This part depicts the number of transactions contained in this transaction.
  3. Transaction Hash — These 32 bytes represent a hash of some previous transaction in the block chain. It is very important to note, this transaction hash is stored nowhere in the block chain itself!! It is a computed value.
  4. Transaction Index — This value says which output of that transaction comprises this input.
  5. Script Length — It shows the length of the input script.
  6. Script Data — This raw data comprises of input-script. The input-script combined with the output script, which we will cover in a moment, are used by the bitcoin system to validate transactions.
  7. Sequence Number — A number intended to allow unconfirmed time-locked transactions to be updated before being finalised; not currently used except to disable lock time in a transaction.
  8. Output Number — It shows the number of outputs involved in the transaction.
  9. Output Value — Value is the number of Satoshi (1 BTC = 100,000,000 Satoshi) that this output will be worth when claimed.
  10. Output Script Length — It shows the length of the output script.
  11. Output Script — Even though we don’t have to execute the script to validate it, we do need to inspect the script to figure out where the output goes.

The SQL database

Firstly, why to use RDBMS?
Traditionally, relational databases are considered better for atomic transactions. In financial sectors, the relational databases are preferred because of tables’ closely inter-bound nature using foreign keys and indexes. They are used to uniquely identify any atomic piece of data within the table. Bitcoin hashes and stores the data such that every hash is unique to the transaction or the block. These hashes can be used to identify particular transactions or blocks and hence can be used as a primary key.

Finally, why to use Postgres?
Compared to other RDBMSs, PostgreSQL differs itself with its support for highly required and integral object-oriented and/or relational database functionality, such as the complete support for reliable transactions, i.e. Atomicity, Consistency, Isolation, Durability (ACID).

  • Data integrity:

When reliability and data integrity are an absolute necessity without excuses, PostgreSQL is the better choice.

  • Complex, custom procedures:

If you require your database to perform custom procedures, PostgreSQL, being extensible, is the better choice.

  • Integration:

In the future, if there is a chance of necessity arising for migrating the entire database system to a propriety (e.g. Oracle) solution, PostgreSQL will be the most compliant and easy to handle base for the switch.

  • Complex designs:

Compared to other open-source and free RDBMS implementations, for complex database designs, PostgreSQL offers the most in terms of functionality and possibilities without giving up on other valuable assets.

  • Setting up the Database, Postgres is easy to setup, Follow the link below for the steps to setup Postgres in ubuntu.

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-16-04

After you are done installing postgres, type in the following commands to setup a database and become a user granting all privileges.

$ sudo -u postgres psql

This will open the command line for postgres, then create user, database granting all privileges :

CREATE DATABASE yourdbname;
CREATE USER youruser WITH ENCRYPTED PASSWORD 'yourpass';
GRANT ALL PRIVILEGES ON DATABASE yourdbname TO youruser;

The Server:

Django, a web application based framework of python, lets the developer create API calls and database structure like a breeze.
It follows the MVT (Model-View-Template) architectural pattern and is maintained by the Django Software Foundation, an independent organisation established as a 501 non-profit.

Why to use Django:

  • Fast.
    Django was designed to help developers take applications from concept to completion as quickly as possible.
  • Reassuringly secure.
    Django takes security seriously and helps developers avoid many common security mistakes.
  • Exceedingly scalable.
    Some of the busiest sites on the Web leverage Django’s ability to quickly and flexibly scale.

Now, let’s setup the Django server :
Prerequisites :
1. Python (2.7 or above and path added to the environment variable)
2. pip
Go to your terminal:

pip install django

Now, to setup a project in django, type in the following commands:

$ django-admin startproject "your_project_name"
cd into the project directory
$ django-admin startapp "your_app_name"

And yes, that’s pretty much it.
To learn the MVT structure, you should pick up this official tutorial from django developers. This will give you a deep insight about web app developement.

Getting back to bitcoin — We need to design a database structure to store it in a way that it’s optimised and easily usable.

So, we need to design tables namely, Block, Transactions, Output Addresses, Input Addresses. As the data contain multi-million entries, to make the read query efficient the relations in the tables are indexed. The indexing algorithm works on the B-tree structure.

Block Table
Transaction Table

The Input and Output addresses table contains Transaction hash as the foreign key of the transaction table.

Input Table
Output Table

To view the data in a more organised and refined fashion, we created web page views with Django -templates.

Transaction table is linked to the block table by one to one mapping through the block height relation.

Address details also include the transactions included through those addresses along with their details. The address tables are mapped to transaction tables by making the transaction hash a foreign key in each case.

Conclusion :

I hope this gives you an insight about the bitcoin and the explorers that work on it. The challenges Ifaced regarding the parsing and migration, I have thoroughly tried to explain them. Let me know if you have any other doubts at info@blockwala.io .

You can check our working explorer on this link :
http://explorer.blockwala.io/btc/index/

References :

  1. https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems
  2. http://codesuppository.blogspot.com/2014/01/how-to-parse-bitcoin-blockchain.html

--

--

Praful Parashar
Blockwala

Backend developer with a keen interest in Machine learning.