Differential Privacy: Implications on Data Sharing and Access in the Cloud.

Motivation

Generative Knowledge
4 min readAug 16, 2018

We deal with sensitive data which includes PII and other kinds of material and non material private and confidential data.

Traditional approach has been to anonymize data by removing PII information. Or to use some kind of cipher (block or stream) based encryption. But by employing these traditional mechanisms there has been some significant and public data breaches.

In recent history there has been several data breaches even when they were encrypted.

AOL example of encrypted data being exposed in a matter of hours with the help of sidereal information.

– Gov. Weld’s medical record reidentified using voter records [Swe97].

– Netflix Challenge database reidentified using IMDb reviews [NS08]

– AOL search users reidentified by contents of their queries [BZ06]

– Even aggregate genomic data is dangerous [HSR+08]

Netflix 2008 contest lead to a data breach and a professor and his student from Texas A&M could identify individuals from the published anonymized dataset with the help of sidereal information from IMDb.

Core Problems

  1. High dimensional data is unique
  2. Simple data anonymizing is unsafe.
  3. Encryption of key data elements like just PII is prone to sidereal attacks.
  4. Most data breaches at large organizations are perpetrated by insiders who has access to relevant sidereal information on the encrypted data sets. This is a HUGE threat.

Privacy — Data Size — Accuracy Tradeoff

We can’t achieve all three at the same time to the same degree.

Differential Privacy

A strong notion of privacy that:

• Is robust to auxiliary information possessed by an adversary

• Degrades gracefully under repetition/composition

• Allows for many useful computations

Emerged from a series of papers in theoretical CS:

[Dinur-Nissim `03 (+Dwork), Dwork-Nissim `04, Blum-Dwork-McSherry-Nissim `05, Dwork-McSherry-Nissim-Smith `06]

A randomized algorithm C is 1 differentially private if ∀ databases D, D’ that differ on one row and query sequences q1,…,qt

Distribution of C(D, q1, …, qt) ~ Distribution of C(D’, q1, …, qt)

The Cloud Setting

Properties of the Sanitizer

How to Achieve Differential Privacy

Randomness: Added by randomized algo A.
Closeness: Bounded likelihood ratio at every point.

Properties of Differential Privacy

Post Processing

Risk doesn’t increase if you don’t touch the data again.

Graceful Composition

Total privacy loss is the sum of privacy losses.

How to achieve Differential Privacy

Global Sensitivity Method

--

--