The Combined Might of Differential Privacy & Blockchain
How a combination blockchain and differential privacy can unlock powerful new business use cases while keeping data secure and confidential.
Businesses that store and process sensitive customer data are often faced with the challenge of enabling compliant and privacy-preserving access to the data. What we are seeing from an increasing set of our enterprise partners is the desire to enable data discovery and access to data, from across their customers, to internal teams to improve their products and services. Regulations such as GDPR cause companies to institute data access controls and approval processes that are onerous and prolonged with compliance teams often refusing access from a regulatory standpoint. Methods such as database anonymization are cumbersome and scrub or abstract away data in ways such that the data is less valuable, while exposing the risk of correlation based re-identification.
To make this more concrete, imagine you are an automaker. You have a sophisticated pipeline that collects telemetry data, such as GPS, changes in vehicle speed, fuel usage, use of vehicle features, average charge levels, among others from vehicles that you manufacture and sell. You store this data in a database with access using a Data Access Object (DAO) that abstracts your database and exposes a SQL dialect. For the purposes of using the data, you have consent to collect data and can build applications and vehicle features that provide your customers a better experience. Now imagine a product team that wants to analyze driving behaviors across all vehicles in a given fleet. They want to build new products that estimate the value of the fleet. In such scenarios, sharing this data requires stringent regulatory compliance and the involvement of internal privacy and compliance teams, perhaps even re-seeking consent for new uses of the collected data.
Privacy-preserving technologies, as described in this post, applied to DAOs can alleviate much of the regulatory burden of managing sensitive data while providing access to it for both internal teams and external partners. By combining this with a ledger that provides an immutable store, all accesses by every data analyst can be checked by any independent audit authority. Add to this mix, the ability to specify not just who can run queries, but also columns they have access to, while permitting only statistical queries, and you have a powerful combination of capabilities that accelerate data use without deep approval processes, shortening your innovation cycle. But then we are skipping ahead! Let’s first describe the privacy technology at play and show how merging it with a policy framework and a ledger provides end to end compliant, auditable use of sensitive data, speeding innovation.
A case for differential privacy
Imagine you have a database of employee salaries. Assume that a query that you permit on the database is the average salary of the employees in the database. If Bob knows the number of employees in the company and runs this query before and after Chloe joins the organization, then Bob can calculate Chloe’s salary as shown below,
- Bob knows the number k of employees in his company
- Bob runs an average salary query and gets N
- Chloe joins his company
- Bob runs the average salary query and gets M
- Chloe’s salary = M(k + 1) — Nk
Differential privacy is a technique that guarantees that the results of statistical queries cannot be used to glean any information about specific individuals or more broadly access specific rows in a database. Information can only be accessed in the aggregate. The Oasis solution for differential privacy works for SQL databases and is based on query rewriting and is shown in Figure~1. One of the advantages of using a query rewriting approach is that any DAO that supports a SQL dialect that includes the mathematical functions abs, random, ln, and sign can be used as the backend data access layer. A wide variety of SQL databases support these functions and can be used as backend databases. The mechanism renders queries intrinsically private by sampling from a suitable distribution and adding noise to the query, prior to submission to the DAO. The noise added balances utility with privacy, failing to return a result if given accuracy needs cannot be met without compromising privacy. Once rewritten, the query can be submitted and the results are guaranteed to be privacy preserving.
Figure 1: The differential privacy mechanism
A mechanism that provides differentially private statistical queries has the following properties:
- By definition, the presence or absence of any given individual’s data in a given database does not alter the results of the queries much. Therefore, independently of consent revocation, sensitive data from a given individual (or corporation) can never be gleaned from the results of queries
- The data is never replicated or stored outside of the business unit that is responsible for gathering and maintaining it, as the mechanism only sends results of statistical queries to the data analysts and external partners
These properties make it possible to glean insights from the data while shortening the approval process knowing that you are never sharing PII but only information in the aggregate. Access is provided to specific data analysts and to specific columns in the database. A policy checker verifies that queries submitted by the analyst are statistical and only seek information from columns the analyst has access to. A ledger durably records audit logs with queries submitted and whether or not they were permitted by the privacy checks.
How about securing the data path from vehicles to the database you ask? For this, the Oasis consent based data capture provides a solution. The data is captured with customer consent, perhaps shown and approved in the vehicle dashboard or mobile application. The DAO integrates with the Oasis platform and is backed by a ledger, providing a controlled conduit for data access, by maintaining consent, checking policies, rewriting queries, and returning results that are differentially private with an audit trail of all actions. The blockchain ledger provides an enduring and tamper-resistant repository of data genesis and data access for complete end to end confidential storage, privacy-preserving access, and audits. This now secures the entire pipeline right from vehicle endpoints to data use.
The blockchain ledger provides an enduring and tamper-resistant repository of data genesis and data access for complete end to end confidential storage, privacy-preserving access, and audits. This now secures the entire pipeline right from vehicle endpoints to data use.
Customers can revoke consent at any time from their automotive mobile app or in-dash software thus assuring them that they retain their right to be forgotten. Once they revoke consent, you can ensure that their data will no longer be collected or used, with verifiability provided by the ledger. The system enables a wealth of new applications to be built by gleaning insights at the speed at which data is generated and uploaded into your databases.
This is similar to our vision for our partnership with the BMW Group! In short, the Oasis platform enables privacy-preserving data analysis that reduces regulatory overhead and delivers new value from customer’s sensitive data.
We leave the reader with the following intriguing possibilities that open up when we combine controlled access to SQL databases with differential privacy and a blockchain. A DAO that controls both incoming queries and provides differential privacy in the outputs is a data token that can participate in data markets! It is a conduit to data. The data never leaves the database but access to it is provided via the DAO. Watch out for a subsequent blog post that takes this idea further!