Distributed Hash Tables And Why They Are Better Than Blockchain For Exchanging Health Records

Michael Dufel
8 min readDec 26, 2017

--

Introduction

Distributed Hash Tables are a form of a distributed database that can store and retrieve information associated with a key in a network of peer nodes that can join and leave the network at any time. The nodes coordinate among themselves to balance and store data in the network without any central coordinating party. Distributed Hash Tables are both fault tolerant and resilient when key/value pairs are replicated. The ability to distribute data among the peers is in strong contrast to the Blockchain model in which every node has a copy of the entire ledger. This is a critical distinction and could allow for the ability to move health records out of the silos of the EHR systems.

Distributed Hash Table Overview

Structure of keys

DHT’s require that information to be evenly distributed across the network. To achieve this goal, the concept of consistent hashing is used. A key is passed through a hash algorithm that serves as a randomization function. This ensures that each node in the network has an equal chance of being chosen to store the key/value pair.

Routing and Lookup Cost

Because DHT nodes don’t store all the data, there needs to be a routing layer so that any node can locate the node that stores a particular key. The mechanics of how the routing table works, and how the table is updated as nodes join and leave the network is a key differentiator between different DHT algorithms. In general, DHT nodes will only store a portion of the routing information in the network. This means that when a node joins the network, routing information only needs to be disseminated to a small number of peers.

Because each node only contains a portion of the routing table, the process of finding or storing a key/value pair requires contacting multiple nodes. The number of nodes that needs to be contacted is related to the amount of routing information each node stores. Common lookup complexity is O(log n) where n is the number of nodes in the network. For example, in a network of a million nodes, only 20 nodes would need to be found. Some DHT algorithms allow for a tradeoff of increasing the amount of router state to further reduce the worst case lookup cost. Route tables are relatively small, and a production DHT operating over a WAN would likely choose to increase the routing table size and could likely further reduce the worst case lookup by half or more.

Iterative and Recursive Lookup

There are two common styles of lookup operations. In an iterative lookup, a requesting node will query another node asking for a key/value pair. If that node does not have that node, it will return one or more nodes that are “closer”. The requesting node will then query the next closest node. This process repeats until either the key/value is found and returned, or the last queried node returns an error saying that the key simply cannot be found.

With recursive lookups, the first requesting node will query the next closes node, which will then query the next closest node until the data is found. The data is then passed back along the chain of requests back to the requestor.

Example Routing Table

In Figure 3 we see a worked example. The example is not representative of any particular DHT algorithm and is intended to illustrate the general concepts of how DHT’s work. Node ID’s and keys must be random numbers are typically the output of a hashing algorithm. Each node is assigned a portion of the keyspace and maintains portion of routing table. The details of how the routing table is maintained as nodes join and leave the network, and how replication is implemented are key differentiators between different DHT algorithms.

DHT’s are not like traditional hash tables in computer science that of a lookup complexity of O(1) — or only one memory address needs to be accessed in order to find the associated value. Each DHT node only contains a portion of the key/value pairs, and only a portion of the

Healthcare Example

DHT’s — like all distributed databases — are good at key/value lookups, but are not good with joins. In the context of a DHT, a join is when multiple key/values need to be looked up to retrieve the information needed. For example, the entire patient record could be stored as a single key/value pair. This reduces the lookup time, for a single record, but if the patient record is sizeable, it would take a lot of network bandwidth and time to download the entire file. Compromises can be made and in the example below, there is a key that stores a list of keys to other documents. It acts like a search index for a patient’s record. The record is broken up and each component is stored under its own key/value pair. This approach allows a requestor to only download the records they want and more evenly distributes the data across the network and thus increases data durability and availability.

Comparing security in Blockchains and DHT’s

Blockchain nodes need to store the entire database in order to maintain their security guarantees. This means that it is simply not practical to store the patient’s record directly on a Blockchain because the amount of data in all the patients in the United States would mean that each Blockchain node would need to store a HUGE amount of data. This would be cost prohibitive. Additionally, even if the Blockchain is optimized to store only small and compressed amounts of patient data, the security risks would be massive. The compromise of even a single Blockchain node would be a massive data breach on the scale of the Equifax breach. The liability would be massive. In general, the only safe way to store private information in a Blockchain is to encrypt the data. However, with encrypted data, you also limit the overall usefulness of storing information in a Blockchain. For example, if data was encrypted, it is not clear how a common use case such as access to patient records in an emergency would be implemented.

These limitations are commonly understood in the Blockchain community, and as such the use of Blockchain is more realistically being limited to storing hashes and locations of patient’s records stored elsewhere. This reduces the risk of data disclosure, but in doing so, reduces the value proposition of the Blockchain for healthcare which is about everyone having access to data. To recap, if you store patient records on a Blockchain, you run into both scalability problems and create a security nightmare. If you only store hashes and pointers, the Blockchain doesn’t offer anything more than existing non-blockchain systems such as CareQuality and CommonWell.

Blockchain Vs CareQuality and CommonWell

In order to achieve healthcare data exchange, you some or all of the following:

- A provider registry

- A master patient index or single patient identifier

- API (FHIR / XDS.b / HL7 / etc) and data (ICD / SNOMED / etc) standards

- A patient search index

- A service registry

- Trust Framework

o Common security standards (SAML / X.509 / PKI / OAuth, etc)

o Identity management for network members (certificate registry)

o Legal agreements and terms of use

These elements are commonly provided via a consortium such as CareQuality and CommonWell as well as some HIEs. These consortiums currently provide the necessary ingredients to exchange data in a scalable manner without the use of Blockchain. For healthcare exchange, a Blockchain network would need a governing consortium and it’s not clear that a consortium would benefit from the use of Blockchain when it can use traditional software approaches at a lower cost. The great touted use of Blockchain as a means of providing strong integrity guarantees can be easily implemented by a consortium by no means requires the use of a Blockchain.

The records themselves would still be stored at source EHR databases. While this is a feasible use of a Blockchain, it provides limited value. The Blockchain search index may be highly available and durable, but that does not extend to the patients data as the data itself is stored in the EHR databases that may not maintain the data over time and the API’s to access may themselves not be highly available.

Key Innovations of a DHT Based Healthcare Record System

A DHT operated by a consortium would offer a number of advantages that would revolutionize healthcare data exchange. In a DHT based system, data can be moved out of the EHR silos and into a database that is centrally governed by the consortium. Data is now highly available and durable and can even survive when a doctor goes out of business or loses patient data. In effect, it is possible to have a true cradle to grave patient record. Because all DHT nodes share common software, which would be managed by the consortium, the consortium would provide a single point of access for other stakeholders that need access to data such as researchers, disease registries, third party vendors,, and more. The central governance of the network would allow the network to have monetary value which can be capture and passed on to the healthcare providers. This would replace the need for meaningful use grants and provide a sustainable source of funding for continuing maintenance and advances of EHR systems.

Additional Cost Advantages

A common difficulty with interoperability is the delay in getting agreements to define, use, and most importantly, implement a standard. In a large enterprise, there are hundreds of individual companies that need to update their software and even more users that need to pay for and install upgrades that implement those standards. The wide number of individual parties means that innovation proceeds at a very expensive crawl. In comparison, in a DHT managed consortium, the database software and management is controlled by a single entity. This dramatically reduces the cost and greatly increases the pace of innovation and change.

--

--