Blockchain and Bigdata for Heathcare Data

Blockchain technology not just with the bitoin, along it has taken its own way through many types of industries which covering Finance, accounting, supply chain and logistics, insurances and more. For example, finance institutions can settle securities in minutes instead of days. Manufacturers (OEMs) and regulators. With greater speed and less risk sections like goods and payment related work can be effectively managed very closely across by all types of business

The adoption of blockchain in the business has got many reasons as it said above, and many are still going to adopt for their business to run faster and smoother. Hyperledger fabric and BigchainDB are the most widely used framework for blockchainifying business process.

Business processes doesn’t just generating data today, they generate wide variety huge amount of data with thrilling velocity. After putting these kinds of data on blockchain it is very important to process and analyze these data in an efficient way.

For high efficiency in processing and analyzing, to provide numerous functionalities Hadoop would be a classic ecosystem for it. A lot of business logics reside in Hadoop platform to analyze and process these data. That’s why it going to be much easier for the industries to adopt blockchain tech if the scalable framework of blockchain exists with Hadoop ecosystem. Approaching this end, HBasechainDB is a first step towards providing a scalable blackchain framework in the Hadoop ecosystem. HbasechainDB is started by high performance computing and data group, in EMI foundation. This is achieved by importing the blockchain characteristics of immutability and decentralization to the HBase database.

Our EMI foundation introducing scalable framework protocol, which approach is focused upon advancing the consensus. Beginning with distributed database, MongoDB and added blockchain feature of decentralized control, immutability while supporting the creation and movement of digital assets to provide a scalable decentralized database — BigChainDB

The major contribution of BigchainDB that enables this scalability is the concept of blockchain pipelining. In blockchain pipelining, blocks are added to the blockchain without waiting for the current block to be agreed upon by the other nodes. The consensus is taken care of by the underlying database. The validation of blocks is not done during block addition but eventually by a process of voting among nodes. This has huge performance gains and BigchainDB has points to transaction throughputs of over a million transactions per second and sub-second latencies.

The prime contribution of BigchainDB that permit this scalability is the concepts of blockchain pipelining. In blockchain pipelining, blocks are added to the blockchain without waiting for the current block to be agreed upon by the other nodes. The consensus mechanism is taken care by the underlying Databases. The validation of blocks is not done during block addition but eventually by a process of voting among nodes, this has huge performance gains and BigchainDB has points to transaction throughputs of over a million transaction per sec and sub-sec latencies.

The Framework delineation: HDBChain (Hadoop DataBase Chain) is a superlative peer-to-peer network operating using a confederation of nodes. All the nodes in the confederation have equal prerogative which provides HDBChain its decentralization. Such a superlative peer-to-peer network was inspired by the Internet Domain Name System. Any client can submit or retrieve transactions or blocks, but only the confederation nodes can modify the blockchain. The confederation can grow or shrink during the course of operation of HDBCHain. Let us say there are n confederation nodes N1, N2… Nn. When a client submits a transaction t, it is assigned to one of the confederation nodes, say Nk. The node Nk is now responsible for entering this transaction into the blockchain. Nk first checks for the validity of the transaction. Validity of a transaction includes having a correct transaction hash, correct signatures, existence of the inputs to the transaction, if any, and the inputs not having been already spent. Once Nk has validated a set of transactions, it bundles them together in a block, and adds it to the blockchain. Any block can only contain a specified maximum number of transactions. Let us say t was added in the block B.

When B Block is added that’s validity is not decided as soon it come. Since the confederation is allowed to grow or shrink during the operation of HDBCHain blocks also include a list of voted based on the current confederation. All the nodes in the voter list for a block vote upon B for its validity. Upon voting on a block, a node validates all the transactions in the block. A vote is valid only when the voted node’s previous transactions are found to be valid, otherwise this vote becomes invalid. If the majority of the vote are from valid node then the transaction becomes a valid else if the majority of the votes is from the invalid then the transaction become more likely undecided. Only the transactions in a valid votes block are considered to have been recorded in the blockchain. The ones in the invalid blocks were ignored all together. Though chain retains both valid and invalid blocks. Being invalid block that doesn’t imply all the transactions in the block are invalid. Therefore, other federation nodes get that invalid block transaction to further chance of inclusion in the blockchain. This reassignment can be done randomly. Thusly, if a particular reprobate node was trying to include an invalid transaction to the blockchain, this transaction likely will be passed ahead to assign another node for another chance and declined from consideration. Ergo, if B block gains a most of valid votes, then the transaction t would have been irreversibly added to the ledger. While alongside, if B were invalid, then t would be repeatedly assigned with other nodes in confederation until it added to chain or completely rejected by the consensus.

As we know the chain not formed by the time of block get created. When a block pass into Hbasechain table, the blocks go residing in HBase in the lexicographical order of their IDs. The chain is in fact formed while time of voting. When block get vote from node, it further states the antecedent block that it been voted upon. Thus despite of waiting for all the confederation nodes to evaluate the present block heading to proceeding of the creation of a new block, blocks are generated irrespective of validation. This is the technique of blockchain pipelining. Gradually, the blockchain accumulates both of valid and invalid blocks. To keep the chain immutable invalid blocks were not deleted. Other note is that during it would seem that different nodes could have a different view of the chain depending upon the order in which they view the upcoming blocks, it is not seen in practice in HBaseChainDB due to the strong consistency of HBase and the fact that the blocks to be voted upon are ordered based on their timestamp. Thus every one of node see the only one order of blocks, and we have the same chain view for different nodes

To tamper with any block within the blockchain, an adversary will need to modify the block, resulting in a change in its hash. This modified hash wouldn’t match the vote info for the block within the votes table, and additionally in succeeding votes that check with this block because the previous block. so an adversary would need to modify the vote data all the way up to the current. However, we tend to need that each one the votes being appended by nodes are signed. Thus, unless an adversary will forge a node’s signature, that is cryptographically onerous, he cannot modify the node’s votes. In fact, he has to forge multiple signatures to have an effect on any modification within the blockchain preventing any probabilities of change of state. this fashion HBasechainDB provides a tamper-proof blockchain over HBase.

Exploiting HBase: during this section we tend to describe the excellence between MongoDB and HBase. we additionally justify the means that to attain bigger performance with the projected system style.

MongoDB could be a document store database. A document is a massive JSON block with no specific schema or format. this offers a position to dynamic use cases and ever-changing applications. MongoDB doesn’t offer triggers. though MongoDB has its own benefits, the document store characteristic of MongoDB degrades its performance for following operations:

1. Working with individual columns.

2. Performing join operations.

HBase is a wide column store database. It is a distributed, scalable, reliable, and versioned storage system capable of providing random read/write access in real-time. It provides a fault-tolerant way of storing large quantities of sparse data. HBase features compression, in-memory operation and Bloom filters on a per-column basis.

We use the following characteristics of HBase extensively to derive performance:

1. HBase is partitioned to tables, and tables are further split into column families. Column families must be declared in the schema, and we can group certain set of columns together. One of the major operations in blockchain transaction is checking for Double-Spending. In order to make the check for double spending more efficient, we can keep the input column of all these transactions in a separate column family. This will allow us to perform the check for double spending faster because the region server will need to load only one column family which contains the input of the transaction. In case of database such as MongoDB the database server needs to load the whole document before filtering out the input column and performing Double Spent check.

2. HBase is optimized for reads, supported by single-write master, which results in a strict consistency model. And use of Ordered Partitioning supports row-scans. In Blockchain we need one write and many read operation because the transactions are written only once but read many times for various purposes like checking double spending and performing checks on whether any tampering took place.

3. HBase provides us with various ways in which we can run our custom code on the region-server. HBase co-processor and custom filters are two such ways. HBase co-processor can act as database triggers. In our implementation we use these features in following ways:

a. The check for double spending is generally done by loading the transactions to the federation nodes (i.e. the client system). Loading this many transactions from region-server to the federations node system is a major bottleneck for the system throughput. In our approach, instead of pulling the data required for double spending check on to the client-system, we push the computation check to the region-server using HBase custom filter. This approach improves the performance in two ways:

i. Data does not move towards the computation node rather computation moves towards the Data node. Since the code size is exponentially lesser than data size, we improve the system by decreasing the communication time.

ii. Computation for double spending is done in parallel on multiple region-server compared to the traditional approach of checking on a single Client node

b. Changefeed brings a great benefit to the Blockchain framework. We use HBase co-processor to implement changefeed which will notify immediately whenever a hacker tries to change or delete the content of the database.

Implementation Details: The Federation Nodes in HBasechainDB are initialized with a key-pair; signing system. SHA3–256 hashing scheme is used for hashing the transactions and blocks. The current implementation of HBasechainDB uses six HBase tables. A critical issue in the current design of HBase tables is that of designing the row key, since the region splits and the scans on HBase tables are done in the lexicographical order of the row key. The row key pattern depends upon the access pattern for the data in the Hbase table.

Following is the description of the HBase tables:

1. backLog: When a transaction is submitted to the Federation nodes, the transaction is randomly assigned to one of the nodes. All such assigned transactions are stored in the backlog table with each transaction stored in a single row. A node scanning the backlog table should only have to read the transactions assigned to itself. Thus, the first segment of the row key for backlog table is the public key of the node to whom the transaction was assigned, to ensure that a node can scan the backlog table with the row prefix being its own public key. The last segment of the row key contains the transaction reference id. So the row key looks like: <publicKey>_<transactionId>

2. block: This is the table that contains all the blocks in the blockchain. Each block is a logical block which contains only the id’s of the transaction which are present in the block. The actual transaction details are stored in “hbasechaindb” table. Since the access pattern for this table is looking up blocks based on block id, the row key for this table is just the block id: <blockId>

3. hbasechaindb: This is the table where all the transaction details are stored after a transaction is put on the blockchain. In this table each row corresponds to a single transaction. Since the access pattern for this table is looking up transaction based on transaction link id, the row key of this table is <transaction link id>. The transaction link id consists of <block_id>_<transaction_id>. This transaction link id which is of previous output is used in inputs of current transaction while spending an asset

4. toVote: Every new block created has to be voted upon by the Federation nodes. For this, we need to inform the Federation nodes of their need to vote upon a newly created block. To this end, every block created is added to this table to signal the node for voting. It is removed from the table once the node has finished voting on it. The row key of this table is : <federation node’s signing key>_<block id>

5. vote: This is the table in which all the votes are recorded. There has to be an entry for every federation node which votes for their respective blocks. The row key of the table is: <block_id>_<decision>_<Fed. Node public key>

6. reference: This is the table which stores the map between transaction link id and transaction id. This table acts as an index when the details of a transaction is queried. Since the access pattern of the table is transaction reference id, the row key of this table is just the transaction reference id: <transacation_link_Id>

When a transaction is submitted to HBasechainDB, it is first put in the backLog table. Federation nodes picks the transactions from backLog table in certain time interval, checks the validity of the transactions, bundles them into blocks and adds those blocks to the Blockchain. As show in Fig. 1, when a federation node forms a block, it updates 3 HBase tables. In block table, the transaction_Id of all the transactions are made as separate blocks and stored. In the hbasechaindb table, all the transaction details are stored. In the toVote table the information about newly created block is stored. The federation nodes refers this toVote table to vote for the block. All the Federation nodes, in certain time interval checks the toVote table and cast their vote after checking the validity of the block. All the votes are stored in the vote table. After the validity of a block, entries corresponding to all the transactions are made in the reference table.

The complete implementation of HBasechainDB is done using Java since the performance of HBase API for Java is best among the HBase API’s present for different languages. HBase API for Java also gives advantage of writing custom filters and coprocessors.