Privacy Preserving Decentralized Marketplace for Data : Use case for Block chain
Data sharing is inevitable in the current Digital ecosystem. Individuals and Entities end up sharing the Data, knowingly or otherwise, related to Identity, Transaction, personal preferences etc.. This is aptly called ‘Digital trail or exhaust’ and needs to be carefully controlled. If the privacy is not preserved, Data sharing in Digital economy poses significant risks to Individuals and Organization.
However, there are clear benefits of sharing the Data in the form of better insights resulting in superior services and value to the Data provider. Very good example of this service is the Traffic congestion shown on Google maps. Data required to determine traffic congestion is provided by commuters through the Telecom operators. Social media companies such as Google, Facebook, Twitter are well-known examples of value added services provided at zero cost to Customers. They are also effectively monetizing the Data collected through these services.
Data needs to be considered more as Raw material for the value added services offered. Availability of Raw data is a must have for the current economy. Several forms of Data monetization models have emerged where ‘Trusted third party’ ends up owning incredible amount of Data. Individuals and Organization providing the Data are unfortunately not well informed about the mechanisms nor risks associated with this[i]. The concept of Privacy preserving data sharing may seem like a Death blow to these Businesses. Digital Identity is a case in point.
This is changing very rapidly from both perspective. On one side, Governments across the Globe are realizing the need to regulate this process by providing basic protection to Individuals in the form data privacy laws or creating infrastructure for Data governance and authenticity (e.g. GPDR in EU)[ii]. It is also becoming clear that the ‘Trusted Third Parties’ cannot be trusted completely, Sony play station network, Target Database clearly highlight the risk[iii]. The challenges arising out of Data breach or single point of failure can have devastating impact on the Digital economy.
On the other side, recent developments in Privacy proving Anonymization techniques, Big Data, Decentralized Databases, Distributed ledgers & Decentralized Apps (Dapps) may provide the crucial Trade off. The article is an attempt to join the dots and propose a solution.
Privacy preserving Data Sharing
What is Data Privacy ? : privacy is the privilege to have some control over how the personal information is collected and used. Information privacy is the capacity of an individual or group to stop information about themselves from becoming known to people other than those they give the information to. One serious user privacy issue is the identification of personal information during transmission over the Internet.
It would be worth understanding difference between Data Privacy & Security
Privacy is the appropriate use of user’s information
Security is the ‘confidentiality, integrity and availability of Data
Privacy is the ability to decide what information of an individual goes where
Security offers confidence that these decisions are implemented
Privacy is consumer’s right to safeguard their information from any other parties
Security provides this with appropriate confidentiality to protect the Data provider
It is possible to have poor privacy and good security
It is difficult to have Good Privacy without adequate Security
Payment processing Industry has been on the forefront of this. Adoption of EMV, PCI DSS standards and Tokenization are prominent examples.
Several Technology solutions have emerged which are “Privacy proving” techniques preserving the privacy. Just to name few
· Identity based anonymization
This topic has been well researched several advance algorithms are now available.[iv]
Apart from privacy there are several other features which are desired from for the Data sharing infrastructure
· Decentralized: Avoid Single point of failure
· Highly Available
· Provides Data Security
· Control mechanism for Data ownership & sharing ( Defines who owns the data and who is authorized to share after appropriate Anonymization)
· Data Market place: Structure for Incentivizing Data sharing and sharing costs
· Directories & Token Vaults (where Individuals can create Token)
· Access based on Privileges (E.g. special privileges to Regulators and Govt Agencies)
· Ability to run sophisticated Data Analytics & Machine learning
Centralized Vs Decentralized approach
Centralized system has following inherent drawbacks
· Creates Single point of failure:
· Does not provide Immutability. Changes can be made to Database
· Can lead to monopoly & unfair pricing
Advent of Technologies in Distributed Ledgers (e.g. Hyperledger, BigchainDB), Open source Databases & Decentralized Application (DApps) provide promise to create Decentralized Infrastructure which can not only address the issues above but also provide following features / advantages
· Federation of Permissioned Members ensuring Proof of stake
· Open source Technologies
· Benign & Byzantine Fault Tolerant system
· Ability to mitigate Sybil attacks
· Create incentive and cost sharing mechanism using Digital Currency, Assets and Smart Contracts
· Provide Proof of Process (e.g. Proof of Data validation & authenticity)
These features can truly create ‘Marketplace for Data[v] which will have multiple competing Data providers and Data consumers. With appropriate access mechanism & logic for De-anonymization, Regulators and Government agencies can have privileged access which provides required visibility in to this Data. (Financial Crime investigation[vi], Anti Money Laundering etc). Banks, Insurance companies and other e-commerce players can participate as Data Providers as well as Consumers.[vii]
The proposed Data Infrastructure can also create mechanism which allows Individuals to share Digital content selectively and securely with other participants with complete control.
Privacy preserving decentralized infrastructure can provide avenues to monetize the Data by Value Added services such as Data Analytics computation platform.
Section below provides Logical & Conceptual Technical view of the proposed solution. It is assumed that there will be a Federated model (No need for Trusted Third party) of Participants who join based certain agreed Protocol. Initially, a Lead Agency would need to create the charter for the Participants. Data Providers and Consumers will need to be ‘Permissioned’ to join the Consortium.
The role of Lead Agency can be performed by Regulators / Government agencies to kick start the process and ensure that the charter created for the consortium is both fair and complies with Regulations or Law of the land.
Technology stack options
Here is view of Technology stack which is [viii]
· Uses Open source technology
Key advantages of using Block chain or Distributed Ledger
· Provides Immutability
· Proof of Process
· Helps in transferring value to Data providers from Consumers
· Automation using Smart Contracts
Case for Decentralized shared service for Identity management
For the Digital Ecosystem to survive Robust Digital Identity management system is a must. Comprehensive Digital Identity Management system with broad coverage is still elusive. Identity Management systems need to store various attributes related to Identity
Some important characteristic of the desired solution is as follows
· Privacy preserving and secure
· Avoids Single point of failure
· Ability to manage
o Ownership of Data
o Data Validation process
o Selective and secure sharing of data
o Comprehensive attributes
o Lifecycle of Identify Management
There has been rapid advancement in development of Wide Coverage Identity Management system. Several Organization, both Private and State controlled, have emerged in last couple of years. Notable examples are Aadhar or UIDAI Database in India[ix], private organizations such as IdentityMind Global, Trulioo etc.[x]
These organization are providing critical solution for Identity verification, Fraud detection, P2P payments. While these platforms are definite improvement over current solutions, there is need for further improvements.
Platforms such as KYC Chain[xi] have attempted to provide some of those improvements by creating a potential decentralized platform.
Decentralized Data Market place for Identity can be long term solution providing wide coverage. Such common Identity Data can help address key issues faced by Govt, Regulators, Banks, Law enforcement agencies without compromising on Data Privacy issues.
Case for Decentralized Data sharing for Health Care Industry
Availability of Quality Health Care Data can make wonders to society. Data such as Genome data can provide insights which can benefit Individuals, Research organization and Pharma companies.
Collecting genomic data through genome sequencing and cheaper “SNP arrays” is important both for scientific research and commerce involving genome sequencing and human health. It is particularly potentially beneficial for personal genomic medicine. While numerous databases already exist to capture genomic data, and to use it in science and commerce, current schemes to accumulate and proliferate that data for use are insufficiently secure or open
This applies not just to Genome or Individual DNA but other health related information. It can help prevent and provide critical medical help.
Privacy preservation and security is equally critical element of this solution. Any breach or misuse of Medical information can disastrous.
Availability of this data along with powerful Analytical and Machine learning algorithms can make this platform extremely useful to Individuals, Health care industry, Insurance companies & Government.
Several Decentralized solutions are emerging. One notable example is “Gene-Chain” a solution for enhancing privacy, security, and utility in genomic databases by Encrypgen.[xii]
To realize benefits of Big Data, Machine learning and AI assumes availability of Quality Data covering large population. Decentralized Privacy preserving Data market place can address several issues in sharing data without compromising the Privacy. Development in various Technologies related Anonymization, Distributed Ledgers & Distributed Databases provides promise to deliver this.
[i] Battery Status Not Included: Assessing Privacy in Web Standards, Arvind Narayanan, Steven Englehardt etc (2016)
[iii] http://www.nytimes.com/2011/05/01/business/01stream.html?_r=2&ref=health Article on New York times by Singer, Natasha
[iv] Big data privacy: a technological perspective and review: Priyank Jain, Manasi Gyanchandani and Nilay Khare (2016), Enhancing cloud security using Data anonymization by Jeff Sedayayo (2012), Hitachi develops technology to anonymize Encrypted personal data(2016), Enigma: Decentralized Computation Platform with Guaranteed Privacy by Guy Zyskind, Oz Nathan Alex, Sandy Pentland (2016), A Precautionary Approach to Big Data Privacy by Arvind Narayanan, Joanna Huey & Edward W. Felten (2015)
[v] https://dataprivacylab.org/people/sweeney/new.html by Dr., Latanya Sweeney (2016)
[vii] Could banks be the consumers’ data champion? BANKNXT by Chris Skinner
[viii] BigchainDB: A Scalable Blockchain Database, by Trent McConaghy, Rodolphe Marques, Andreas Muller and others(2016), Enhancing cloud security using Data anonymization by Jeff Sedayayo(2012), Privacy-preserving Machine Learning Algorithms for Big Data Systems by Kaihe Xu, Hao Yue, Linke Guo, Yuanxiong Guo, Yuguang Fang (2015)
[ix] (https://uidai.gov.in/, n.d.)
[xi] (https://kyc-chain.com/, n.d.)
[xii] (https://www.encrypgen.com/, n.d.)