Differential Privacy Note 1: A Powerful Synopsis from the Bible of Privacy

Naveen Manwani

Published in

Secure and Private AI Writing Challenge

7 min readJul 15, 2019

Because important things go in a case, you’ve got a skull for your brain, a plastic sleeve for your comb, and a wallet for your money, but what about your Privacy in which case you’ll put it. __________________Anonymous — — — — — — — —

Pinpoint Intention:-

Anyone who starts their journey in the field of Machine Learning or Deep Learning is daunted by two specific attributes:

Firstly the heavy-duty maths which involve concepts such as Eigenvalues & eigenvectors, partial derivatives, Bayes’ Theorem, etc.

Secondly and most importantly reading books ( such as Neural Networks and Deep Learning) on these domains and implementing research papers on specific applications for e.g. Faster RCNN, Pix2Pix.

So, I thought instead of writing about a code-based implementation or explaining an entire application from head to toe why not just go against the status quo and resonate my intention of helping young deep learning enthusiast by writing summaries of chapters of deep learning/machine learning related books and what’s better than starting with the bible of privacy “The Algorithmic Foundations of Differential Privacy”.

Therefore, starting with this article, I’ll provide a simplified summary of various section of the book “The Algorithmic Foundations of Differential Privacy” which eventually will help all the young deep learning practitioners to tackle their ghosts and feel more welcomed in this deep learning space than they used to feel before.

So, enough of Intention, Let’s get into action.

Section I: Promising Definition to Delay Inevitable:-

Let’s start by asking yourself a question i.e. What is Differential Privacy is it a Technique, a Technology or an Algorithm or a Definition

Thus to illustrate the above question, let’s go through an

Example:

“suppose an insurance firm conducted a study over a medical database which allowed them to draw a conclusion that “ smoking cause cancer” which directly affected an insurance company’s view of a smoker’s long-term medical costs.”

Well the above scenario might raise a few more questions such as:

Has the smoker been harmed by the analysis?
Has the smoker’s privacy been compromised?
Was his information “leaked”?

Differential privacy will take the view that it was not, with the rationale that the impact on the smoker is the same independent of whether or not he was in the study. It is the conclusions reached in the study that affected the smoker, not his presence or absence in the data set.

Hence Differential privacy is a definition, not an algorithm which is as follows:

“Differential privacy” describes a promise, made by a data holder, or curator, to a data subject: “You will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies, data sets, or information sources, are
available.”

In brief, it addresses the paradox of learning nothing about an individual while learning useful information about a population.

Differential privacy, specifically ensures that any sequence of outputs (responses to queries) is “essentially” equally likely to occur, independent of the presence or absence of any individual. Then the probabilities are taken over random choices made by the privacy mechanism (something controlled by the data curator), the term “essentially” is generally captured by a parameter, ε. A smaller ε (epsilon) will yield better privacy (and less accurate responses).

Alright, since we are talking about ε (epsilon) its time to produce an interesting fact from the digital archive about Differential privacy parameter “epsilon”.

“That MacOS’s implementation of differential privacy used an epsilon of 6, while iOS 10 had an epsilon of 14. As that epsilon value increases, the risk that an individual user’s specific data can be ascertained increases exponentially.”

Ring some bells, to read more about it, do hover over point 2 of References Section

Section II: Why Differential privacy stand apart in the race of addressing privacy-preserving data analysis

A. Data Cannot Be Fully Anonymized and Remain Useful:

As it’s a truth that richer a data more interesting and useful it is for any sort of data analysis. This has led to notions of “anonymization” and “removal of personally identifiable information,” where the hope is that portions of the data records can be suppressed and the remainder published and used
for analysis.

Now, how they suppress the data ??

By using an approach called “Naming” . in this approach they named an individual by a sometimes surprising collection of fields, or attributes, such as the combination of zip code, date of birth, and sex, or even the names of three movies and the approximate dates on which an individual watched these movies

How it benefits the privacy attacker ??

Basically, This “naming” capability can be used in a “linkage attack” to match “anonymized” records with non-anonymized records in a different dataset.

Real-life Examples:

The medical records of the governor of Massachusetts were identified by matching anonymized medical encounter data with (publicly available) voter registration records.
The Netflix subscribers whose viewing histories were contained in a collection of anonymized movie records published by Netflix as training data for a competition on recommendation were identified by linkage with the Internet Movie Database (IMDb).

But if we talk about our front-runner Differential privacy it generally neutralizes linkage attacks: since being differentially private is a property of the data access mechanism, and is unrelated to the presence or absence of auxiliary information available to the adversary(opponent).

Therefore access to the IMDb would no more permit a linkage
attack to someone whose history is in the Netflix training set than to
someone not in the training set.

B. Queries Over Large Sets are Not Protective:

Under this approach, the author clearly states that Forcing queries to be over large sets is not a solution, as it may lead to another type of attack popularly known as the Differencing Attack.

Let’s understand it through an example:

Suppose it is known that Mr. Chadha is in a certain medical database. Taken together, the answers to the two large queries “How many people in the database have diabetes?” and “How many people, not named Chadha, in the database have diabetes?” yield the diabetes status of Mr. Chadha.

I know that was too much information to gulp down in one breadth, therefore have a cup of coffee or a vegan juice to keep yourself up for the next techniques.

C. “Ordinary” Facts are Not “OK”:

Revealing “ordinary” facts, such as purchasing rice, may be problematic if a data subject is followed over time. For example, consider Ms. Shalini, who regularly buy rice, year after year, until suddenly switching to rarely buying rice. An analyst might conclude Ms. Shalini most likely has been diagnosed with Sugar. The analyst might be correct, or might be incorrect; either way, Ms. Shalini privacy is harmed.

D. “Just a Few” philosophy:

In “just a few” technique the privacy of “just a few” participants are compromised and in fact, it provides privacy protection for “typical” members of a data set, or more generally, “most” members. Setting aside the concern that outliers may be precisely those people for whom privacy is most important. So, therefore saying that it’s a technique which can address all the privacy-related data analysis issues will be pretty earlier as a well-articulated definition of privacy consistent with the “just a few” philosophy has yet to
be developed. However, Differential privacy always acts as an alternative when the “just a few” philosophy is rejected.

So, that’s it my fellow readers with that last technique you have come to the end of this article/summary post.

To further, strengthen your understanding do check the links mentioned in the Reference Section

References:

Gratitude Corner:

A huge shout out to Udacity and Akshit Jain for providing me this opportunity to grow and learn about this new field of Artificial Intelligence and a special thanks to Trask for directing the learners like me on recent, reliable knowledge resources. And lastly, I’m glad to be a part of this young vibrant #UdacityFacebookScholar community.

Thank you for your attention

You using your time to read my work means the world to me. I fully mean that.

If you liked this story, go crazy with the applaud( 👏) button! It will help other people to find my work.

Also, follow me on Medium, LinkedIn or Twitter if you want to! I would love that.

Naveen Manwani - Medium

Read writing from Naveen Manwani on Medium. A Machine Learning Engineer, a Deep learning enthusiast |Google India…

medium.com

Naveen Manwani - Machine Learning Engineer - AIMonk Labs Private Ltd | LinkedIn

Computer Software Join LinkedIn siddaganga institute of technology Bachelor's Degree, Electrical, Electronics and…

www.linkedin.com

naveen manwani (@NaveenManwani17) | Twitter

The latest Tweets from naveen manwani (@NaveenManwani17). Machine Learning Engineer @ AIMONK Labs Pvt ltd ,Deep…

twitter.com