BigchainDB… What’s the buzz

What is the best solution nowadays to store big amounts of data within a trustworthy environment? The storage of big data became easy, but what about the environment where the data is stored. Can we trust the environment and the third party storing your data? What about policies forcing companies to expose your personal data? BigchainDB provides the technology to store big data in a secure and efficient environment by decentralizing the storage.

The scalable blockchain database.

Project Test Bunnies

So, it all started with a project for a consortium of pharmaceutical companies. Those companies carry out medical tests for their newly developed medicines. Test persons get paid to undertake these kind of tests. The problem lies in the lack of coordination between pharmaceutical companies of different countries. A test subject is only allowed to undertake a test within the same category in a time span of five years. The issue now is that they can do the test in multiple countries because there is no coordination. This being said, test subjects can earn a lot of money by doing multiple tests in the same category. However, this is strictly forbidden because it can pose a threat to their health and can lead to incorrect test results for the pharmaceutical companies.

Blockchain technology might be the answer to this problem. Blockchain is not just a buzzword, it has a goal to make the world a better place… Sounds familiar right? But it really can, if you apply blockchain to the correct use case. Blockchain has the power to make data tamper resistant and eliminate the need for a central (data) authority.
There is a strong need within business environments for data protection. This environment needs a mechanism of trust to rely on their data.

There are two stakeholders involved in the ‘Test Bunnies’ project: the test person and the pharmaceutical company. On one hand, a pharmaceutical company has both the permissions to create and transfer assets on a blockchain. In this case, a pharmaceutical company will register a medical test which involves a test subject. The next step is to transfer this medical test to the test subject itself. In this way, test subjects are given full control over their tests and hereby their privacy. When a pharmaceutical company creates a test, the test will be automatically transferred to the linked test person. A test subject can only retrieve a list of the transubstantiated tests.

Next, I want to give you a better idea of the different technologies that interact and together form my application. As you can see, Angular2 forms the interface for both company and test person. I use MongoDB for basic user storage. A BigchainDB asset consists of four attributes: test person name, type of test, date and remarks. BigchainDB actually consists of a cluster of RethinkDB (big data) database nodes and BigchainDB Python driver nodes. Those clusters work together to accept data, build blocks and push it to the blockchain.

Technology Stack Project Test Bunnies

Thoughts about BigchainDB

BigchainDB still remains a bleeding edge technology. It is totally not ready for use in production. What I noticed is the instability of BigchainDB and sometimes poor handling of errors. I managed to not double spend, but even octo spend an asset on the blockchain (might have been my fault). Sometimes, requests just died on the BigchainDB nodes. However, during my five month internship on BigchainDB I have seen huge progress. The most helpful progress is the release of some proper Docker images to easily set up a working BigchainDB cluster for testing (single node) or ‘production’ (full clusters). I also liked the added support for Javascript based development. On Github, you can find repositories with some good BigchainDB code examples.

I see a lot of opportunities in BigchainDB. BigchainDB is a distinctive blockchain approach. It is decentralized and tamper resistant like all other blockchains, but it combines its power with a big data database (RethinkDB or MongoDB). What is most impressive about it, is its​ capability of linear scaling when the amount of requests increases. Compared to the blockchain behind Bitcoin, this is a massive advantage, especially in performance. Besides that, Bitcoin has opted for a fully replicated network . This means that every node has a full copy of the ledger. In contrast, BigchainDB has a replication factor of only three. According to calculations, it is nearly impossible to fraud data. At last, BigchainDB improves its performance by storing invalid blocks on the blockchain. If a block is voted invalid, BigchainDB will just add an invalid attribute. More resources and time are needed to remove the block from the blockchain, which is the case with Bitcoin.

Conclusion

Personally, I believe that BigchainDB is not just a buzzword. It is a technology which enables global intercommunication and sharing of data without need of central authorities. It creates trust and provides a performant blockchain to process data. Even after having a lot of troubles with BigchainDB, I’m optimistic about this technology. I believe that the technology will be fully optimized within the time span of three to five years and has the potential to be one of the leading technologies in building decentralized applications.

Note: BigchainDB has dropped RethinkDB as database and advanced towards the use of MongoDB completely.