SVInsight: How Blockchain Collects Rare Disease Data & Breaks Data Monopoly

On July 31st, SVInsight has published an interview with the two Co-founders, Wei Wang and Allan Zhang, talking about how DxChain solves the Big data problems. Please refer down below for this interview.

“There are data everywhere,” but the questions facing the industry are what economic models can encourage users to share data and how to get data for organizations that need it urgently. For example, how can medical institutions efficiently discover genetic data, and how do patients who are critical data holders share their data safely and reliably?

The Silicon Valley startup DxChain we interviewed today wants to solve the problems as mentioned above with blockchain and big data. How to explain it? We invited DxChain’s two co-founders, Wei Wang and Allan Zhang.

“Absolutely private” or “relatively open”?

Wang’s idea of co-founding DxChain was inseparable from his work experience: Before starting DxChain, Wang worked on blockchain research at AT&T, before that he researched big data at Hortonworks.

At that time, Wang discovered that many of their customers — such as Walmart, Sears, and other retail stores — have a huge volume of data, but they didn’t know how to extract valuable information from the data, such as what kind of goods are the most profitable, how to target different customers and better sell goods.

The problem involves data collection, data cleaning, and a set of data processing for calculation and analysis, which are a significant burden for most companies. Therefore, building a platform that can provide such a set of services at a low price is undoubtedly promising, and the blockchain technology made it possible.

DxChain wants to use blockchain technologies to build a platform with data collection, mining, and analysis that can come up with useful business conclusions. The technology behind this lies in storage and computing on the blockchain.

Wang told us that as you can tell from the company name DxChain: D is the first letter of “Data” in English, and “x” is multiplication. DxChain combines data and blockchain technology to leverage the value of data.

To extract the value of data, you usually take four steps: data collection, cleaning, analysis, and conclusion. But the first step data collection comes with the privacy issue.

We have witnessed some major privacy incidents in the U.S.: Facebook’s stock price fell 20 percent last month, a reflection of the fermenting Facebook data breach scandal earlier this year.

All the data is in the black box of the big Internet companies. As for the use of the data, and how to use it, we don’t know anything about it. This feeling is indeed not very good. If we go to the other extreme, we can protect our data with the method of “no one can see, no one knows.” Doesn’t it work?

Let us take the US medical field as an example. The US Medical Sector has the Medical Electronic Exchange Act (HIPAA), which emphasizes the protection of medical data for each patient. Under the protection of this bill, the specific circumstances and medical records of each patient can only be seen by the hospitals and insurance companies.

Protecting personal medical privacy is of course important. If one research institution wants to use the data to develop a new drug, it will hit the wall unless the institution receives written permission from each patient involved in the study.

Another critical issue is that from a data perspective, individual patient information does not have value. Only collections of patient information are valuable. Is there a platform that allows patients to share data while protecting their privacy after the consent, and can this platform bring together thousands of patients and gather data for research value?

DxChain hopes to use the decentralization and immutable of blockchain to protect user privacy, allowing users to know their own data while sharing their data. Organizations can get a lot of user data through these platforms and develop groundbreaking technologies.

The industry has adopted a variety of practices to strengthen data privacy protection on blockchains, such as homomorphic encryption, multi-party computation, etc., which use encryption to protect privacy. SGX is another widely-adopted technique encrypted by hardware.

DxChain uses a more practical solution — encrypting critical data information for privacy protection with a robust data processing capability.

“Since we can do fine-grained operations on data, data have structures when they are stored on chains. For example, in a data spreadsheet, there is one column for persons’ names. We encrypt that critical information, but disclose other information instead of encrypting the entire file. This is called data model-based data encryption,” Wang said.

Data collection: breaking the data monopoly

Solving the problem of data encryption is to let users share the data with no concerns about “privacy.” The blockchain is likely to spur a revolution of data ownership.

An indisputable fact: most of the data today is monopolized by Internet giants like Google and Facebook. These companies use the data to make profits after they obtain user data, but there is a problem.

For example, you know:

The air conditioning system in the United States is highly complex, and the maintenance is expensive and time-consuming. It usually takes ten days and a half to make an appointment. If a consumer’s air conditioner is broken, they have to get back to the home appliance store where they bought it and pay for repairs.

Air conditioner manufacturers collect the temperature data of ACs at in the user’s home and discover the lousy air conditioner in advance. Then they sell the information to the home appliance store, department store, etc., which is responsible for air-conditioning maintenance. The latter then give that information to targeted consumers, who will later say, “Oh, it’s awesome! My air conditioner really has a problem!”

That model is great, but if you think deep through it, it will be a bit strange: the data is obtained by collecting consumers’ information, but consumers still need to pay for it. That means consumers’ information is used for free.

Under DxChain’s assumption, future consumers will be given an option to put their information on chains. If a third party needs to use this information as a raw material for analysis — whether it is an air conditioner manufacturer or a climate research organization — they need to pay data providers, who in this case are the AC customers at home. In this way, consumers can make profits by sharing their own data.

Let’s get back to the example of DxChain that was just mentioned in the medical field.

We have a lot of medical information about common diseases such as colds and fevers, and there are also drugs for them. But for rare patients, things are a bit more complicated: the fragmentation and lack of data have made it difficult for medical institutions to develop medicines. Due to the shortage of personal information, drug research and development institutions had to sign contracts with hospitals and universities to collect data, and the process of patient treatment was long-awaiting.

However, if there is a platform that allows patients to put their own illnesses, drug research and development institutions can directly purchase those information and cure diseases. Patients will be able to sell data through the platform, raising more money to cure their ailments. It is significant, particularly in such an expensive healthcare market in the United States.

DxChain wants to be that platform.

Storage and computation are both indispensables.

DxChain needs to solve the two problems of data storage and computing on blockchains.

DxChain is a decentralized big data storage and computing network, an open public chain that applies the decentralization of blockchain to storage and computing.

So how to manage storage and computing in the current blockchain world?

Let us talk about storage first.

We all know that Bitcoin and Ethereum have very limited computing and data storage capabilities, but blockchain is proliferating, and soon there will be IPFS, a “decentralized, distributed file storage system.”

However, IPFS is a file system without a chain and lacks an incentive mechanism. That is to say, everyone provides information by interest. IPFS is a bit like the blockchain version of “BitTorrent.” Everyone wants seeds, but no one wants to make seeds. To make a seed, you need both bandwidth and hard disk. There is no incentive except a thank-you note.

So some people say: let’s add a blockchain to IPFS, so it now has an incentive. That is why we have Filecoin, which is currently at a very early stage.

Now we move to the computing.

The blockchain project Dfinity aims to solve the computational problem of the blockchain. Dfinity is an infinitely scalable intelligent distributed cloud computing system and a third-generation blockchain. It is highly compatible with existing Ethereum applications and has a high potential.

However, Dfinity has not been able to solve the problem of where data comes from.

DxChain believes that storage and computing can’t be separated, so it wants to combine the two and put a particular focus on data. Of course, this is not to say that “putting Dfinity and Filecoin” together can solve the problem. Building a blockchain that can both enable storage and computing is a grand challenge, and needs infrastructural innovation.

DxChain believes only one chain cannot meet the demand for storage, computation, and privacy. So it adds data side chain and computation side chain in a way akin to Lightning Network. The master chain runs smart contracts and manages the data chain and computing chain. The three-chain design is called “chains-on-chain.”

DxChain’s state-of-the-art “Chains-on-chain” is inspired by Hadoop, a collection of open-source software utilities developed by Apache Software Foundation.

Over the past decade, Hadoop has addressed the issue of distributed storage of data within an organization or a company. However, to achieve the distributed storage, the problem of attaining trust between different organizations and participants remain unsolved. Today’s blockchain technology provides the perfect solution.

DxChain starts with Hadoop, which has been validated in industries for long. By integrating Hadoop with blockchain, DxChain can eventually solve the problems of distributed storage and computing in a decentralized environment.

From a technical point of view, DxChain has three significant innovations:

DxChain uses two mechanisms to ensure the correctness of the computations: verification game algorithm to verify the accuracy of computation, and Provable Data Computation to verify the correctness of the computation result.

DxChain uses two mechanisms to enable data storage: Proof of Spacetime (PoSt) and Provable Data Possession(PDP) to verify miners’ continuous contributions to the data storage chain.

DxChain’s data model is built on top of storage and defines the data that becomes valuable. Data computation becomes convenient. Also, the data model also helps to implement two privacy protection mechanisms: data model-based encryption and differential-privacy.

Next stop: value internet

DxChain wants to provide a platform that connects many personal computers or specially designed mining machines, which can achieve a low cost of storage and computing. That platform can also ensure that the massive data will not be monopolized by a large corporate and keep a fair distribution of value.

Data buyers and sellers can get what they need in a so-called data market. Different information will have different prices, and the price of the same product (i.e., data) will fluctuate over time. For example, during the holiday season, data relating to shopping preferences will rise in price.

Said DxChain Co-Founder Allan Zhang, DxChain’s ultimate goal is to become a “data factory” where raw materials are the variety of data generated in our lives. DxChain storage is like a warehouse, and its computing is like a processing room. DxChain transforms those disordered and messy data into clear and valuable information, making the Internet, which has both noise and signals, become the value Internet of the future.

It will be a beautiful new world, but behind that is the continuous research effort on blockchain storage and computing. It won’t be easy, but it promises big.

With regards to DxChain: A Decentralized Big Data and Machine Learning Network Powered by a Computing-Centric Blockchain.