Evaluating the Privacy of a Function

After defining Differential Privacy , let’s explore a hands on example, to get a better sense on how it could be applied, when working with actual data.

Before going to Machine Learning examples, we’ll look at privacy in the context of a simple database.

tensor([1, 0, 0,  ..., 1, 1, 1], dtype=torch.uint8)

In the above gist, we created a single column database of ones and zeros. You could think of this column as private data property of true and false values. For instance it could represent whether a patient has cancer or not, if it was a dataset for a cancer study.

what are the techniques that we could use on this database, so that we could apply statistical analysis without compromising the privacy of any individual.

Thus, let us describe privacy in the context of our simple database. If we remove a person from the query, and this removal does not affect the output of the query, this means that this individual did not leak any private data.

So, how can we construct a query that does not change when removing any individual?

The first step, is to construct parallel databases, by creating n num_entries (5000) databases of length num_entries-1 (4999), where each one of these databases has a single person removed.

Now, we want to find a way to measure the privacy of a query on this database, through comparing the output of the query against one item of the database, and the output against the rest of the database. To see how a query changes when we remove an individual from the database.

Measuring the sensitivity of a function

As we said before, we want to apply a query on our example database, and see how removing individuals will affect the result of this query. And thus determining whether this query(function) is private.

The query we are going to apply is a very simple query, which is the summation.

In the process we want to measure the maximum amount that the query could change.

From the above gist, the sensitivity is 1, and we could tell from knowing that our database is binary and understanding what our query function does, that it will always be 1. Even if we changed the num_entries for our db.

Sensitivity or L1 Sensitivity, is the maximum amount that the query change when removing an individual from the database.

Note

I am writing this article in part with Udacity’s secure and private AI Scholarship Challenge, as a way to share what I have learned so far.

#60daysofudacity #secureandprivateai

--

--

Aisha Elbadrawy
Secure and Private AI Writing Challenge

I am a computer science graduate, who is passionate about problem solving, learning and education. Interested in Software Development and Machine Learning.