The Formal Definition of Differential Privacy

Explanation of Cynthia Dwork’s formula:

Archit
Secure and Private AI Writing Challenge
2 min readJun 16, 2019

--

Cynthia Dwork’s Formula

Terms:

M: This would be randomized algorithm i.e query(db) + noise or query(db + noise. Basically, M being the query have some sort of noise

S: All potential output of M that it could predict.

x: Entries in the database, i.e count how many times it was in the database.

y: Entries in parallel database

epsilon: The maximum distance between a query on database (x) and the same query on database (y)

Definition:

This definition does not create differential privacy, instead, it is a measure of how much privacy is afforded by a query M.

Specifically, it’s a comparison between running the query M on a database (x) and a parallel database (y).

Parallel databases are defined to be the same as a full database (x) with one entry/person removed.

Example:

Thanks, Henrique Mello for such a nice example:

  1. Let’s take the sum query as an example. If we have a database with 10 entries and each one of them can be 0 or 1, then S = {0,1,2,3,4,5,6,7,8,9}.
  2. Now let’s add some noise. Consider that each sum can be added or subtracted by 1 with 50% probability.
  3. So, for each sum query we do, our result either goes up by 1 or goes down by 1. In that case, S = {-1,0,1,2,3,4,5,6,7,8,9,10}.
  4. The -1 and the 10 were added to account the cases where the sum equals to 0 but the noise is -1 and when the sum equals to 9 and the noise is +1.
  5. M(x) is the result of our query. If we take the sum with noise query above, M(x) could be any value in S, as our query will always return a value that is in it. This satisfies the part `M(x) in S`. Since the parallel database M(y) can return the same results, this also satisfies `M(y) in S`.
  6. Now, Pr[M(x) in S] is the probability that the returned sum is M(x). In our example, let’s say that our database is [0,1,0,1,1,1,0,1,0,1]. The sum of this database can be `M(x) = 6 + 1 = 7` or `M(x) = 6–1 = 5`. as there are only two options therefore, P(Event 1)+P(Event 2)=1, since Event 1 and Event 2 are equally probable then P(Event 1)=P(Event 2)=0.5. So, for this specific query, `Pr[M(x) = 7] = 0.5` and `Pr[M(x) = 5] = 0.5`, because the value of M(x) without any noise is the same and the probability of adding or subtracting by 1 is 0.5. So either result is equally probable.

If Found Useful Clap it! and Share it! to others in need!!

https://www.linkedin.com/in/garg4/

--

--

Archit
Secure and Private AI Writing Challenge

Interested in Differential Privacy, Deep Learning and Machine Learning