The Blockchain Essential — A Hybrid Data Model

Naga Vangala
NHCT - NanoHealthCare Token
4 min readMay 31, 2018

Data generation and usage on daily basis is increasing at an astounding pace. Data sources are also getting added from standard structured data sources to unstructured data from various social media, connected devices, trackers and monitors. The traditional data stores which relied on relationships are no longer able to manage this vast confluence of data. NoSQL stores are utilized to store and retrieve unstructured data effectively. Applications needing both traditional data types and unstructured data are going in for a model which uses both the data stores. However, with the advent of blockchain there are additional requirements and new set of constraints on data that is being shared by the applications. Innovative ways of managing massive data are needed to ensure that the data privacy and confidentiality is maintained. Added to this, the mandatory compliance requirements make the data modelling a more complex exercise for data and solution architects.

A hybrid data model helps to store this varied data based on the importance of the data, privacy and governance requirements. Such an architecture suits the data placement strategy by handling data in traditional RDBMS, in NoSQL, and native blockchain stores, or in all, while at the same time enabling the data management of a full range of data: structured and unstructured, in repositories and in transactions.

The traditional database data models fail to satisfy the real world needs of emerging blockchain applications, particularly considering the production environment requirements. Often, data management platforms cannot be pigeon-holed into a technical constraint, viz., “just” RDBMS, or “just” NoSQL or blockchain store. More often, except for very trivial applications or PoCs, blockchain store will not and cannot suffice. Real world applications need to store large, very large to humongous data. For example, any healthcare application will have structured patient data, with large files for diagnostic reports and unstructured data from other connected devices and ad hoc encounters. It is necessary that organizations focus on finding the correct tool for each data requirement rather than attempting to fit square pegs into available round holes.

Typical large data applications have in the past leveraged the economical storage and processing attributes of NoSQL (ex., HBase) in combination with the speed of response of a traditional databases (ex., MySQL). In these cases, data is stored and prepared where it makes the most sense. Based on the application need a subset of that information is then shared with a higher performance platform to meet the workload.

Proposed Hybrid Data Model: The proposed hybrid data model uses all data stores that are available for blockchain applications.

Hybrid Data Model for Blockchain Applications

(a) Native Blockchain Store: The native blockchain store can be used for application data that has to be shared for accessing the application features. Data for user identity and roles / responsibilities can be an example. Blockchain platforms like Hyperledger Fabric provides this as a feature, membership service provider (MSP), in the platform itself. Care should be taken not to store large data in native mode as it is exorbitantly expensive store. Also, the availability of blockchain store in most platforms is limited.

(b) IPFS: Applications needing large storage of files within the blockchain environment can deploy IPFS for saving large files. IPFS will return a hash for each file stored that need to be saved for later retrieval. These hashes can be stored in the native blockchain store or in DDBS transaction for later use.

(c )Distributed Database Systems: The DDBS systems not only provide for storage of traditional data, both transactional and archival, but also come with the added advantage of full fault tolerance and recovery.

  • RDBMS: Traditional transactions and blobs can be stored in centrally managed but distributed data environments. These stores will provide the ease of retrieval and querying for reports and filters that is needed in most applications.
  • NoSQL: Unstructured data can be archived on the big data repositories using NoSQL, viz., columnar: key, value, pairs. These are extensively scaleable and provide searching and retrieval with very low latency.

Hybrid Model Advantages:

  • Best of both worlds: Identification of the right platform based on the application workload and needs to get the best from each of the platforms; ex., high throughput and huge volume of data from traditional stores and immutability from blockchain store.
  • High Productivity and High Control: Taking advantage of the platform specific unique features, the hybrid model will deliver high productivity and at the same time enable high control of data within the application.
  • Compliance: Data protection compliance requirements can be fully met in respective database layers that provide support. Blockchain native store enables the core immutability and audit trail requirements, while the traditional data stores will help in implementing the “right to forget” clause, esp., for GDPR, that is very difficult in blockchain world.

With the right mix of team, process, and technology, an effective top-down/bottom-up hybrid model can be developed and deployed that is appropriate for a production-ready blockchain application.

Join our telegram community to get the latest updates on the NHCT ICO.

--

--

Naga Vangala
NHCT - NanoHealthCare Token

Experienced technology evangelist and blockchain adviser for healthcare, fintech and others. Triathlon enthusiast.