Differential Privacy : The need of the hour

Sayantan Das
2 min readMar 21, 2019

--

Why do we need private machine learning algorithms?

Machine learning algorithms work by studying a lot of data and updating their parameters to encode the relationships in that data. Ideally, we would like the parameters of these machine learning models to encode general patterns (‘‘patients who smoke are more likely to have heart disease’’) rather than facts about specific training examples (“Jane Smith has heart disease”). Unfortunately, machine learning algorithms do not learn to ignore these specifics by default. If we want to use machine learning to solve an important task, like making a cancer diagnosis model, then when we publish that machine learning model (for example, by making an open source cancer diagnosis model for doctors all over the world to use) we might also inadvertently reveal information about the training set. A malicious attacker might be able to inspect the published model and learn private information about Jane Smith [SSS17]. This is where differential privacy comes in.

The promise of differential privacy

Differential privacy describes a promise ,made by a data holder, or curator, to a data subject: “You will not be affected,adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies,datasets , or information sources, are available.” [Dwork,2014]

Ingredients

  1. Input Space X ( with symmetric neigboring relation »)
  2. Output space Y (with σ-algebra of measurable events)
  3. Privacy parameter ε ≥0
[Dwork et al., 2006, Dwork, 2006]

Another Definition…

We say a randomized computation M provides c-differential privacy if for any two data sets A and B, and any set of possible outputs S,S being a proper subset of the Range(M):

Pr[M(A)belongsto(S)] ≤ Pr[M(B)belongsto(S)]*exp(c*|A+B|)

here + doesn’t mean addition. It means symmetric difference (records in one set but not in the other).

Imagine the input to a randomized computation as a set of records. The formal version of differential privacy requires that the probability a computation produces any given output changes by at most a multiplicative factor when you add or remove one record from the input. The probability derives only from the randomness of the computation; all other quantities (input data, record to add or remove,given output) are taken to be the worst possible case. The largest multiplicative factor quantifies the amount of “privacy difference

We’ll get deeper into this topic with Laplace Mechanism in the next post.

Thank you for reading.

--

--