#Introducing Local and Global Differential Privacy #Lesson 5

calincan mircea

Published in

Secure and Private AI Writing Challenge

31 min readJul 18, 2019

#60daysofudacity

part of :

Secure and Private AI Scholarship Challenge from Facebook

Secure and Private AI by Facebook Artificial Intelligence

In this lesson, we’re going to get into the mid of developing differentially private algorithms.

The main strategy that we’re going to take to protect individual’s privacy is one of noise, meaning we’re going to add random noise to the database and to the queries in the database in order to protect people’s privacy.

Now, there are two different kinds of differential privacy, which refer to the two different places that you can add noise:

Local differential privacy adds noise to each individual data point. You can think of this as adding noise directly to the database or even having individuals add noise to their own data, before even putting it into the database. In this setting, users are most protected as they do not have to trust the database owner to use their data responsibly.

Global differential privacy, which adds noise to the output of the query on the database. This means that the database itself contains all of the private information and that it’s only the interface to the data which adds the noise necessary to protect each individual’s privacy.

What is the real difference between local and global differential privacy? If the database operator is trustworthy, the only difference is that global differential privacy leads to more accurate results with the same level of privacy protection.

However, this requires database owner to be trustworthy. Namely, that the database owner should add noise properly. In differential privacy literature, the database owner is called a trusted curator.

#Making A Function Differentially Private

Local differential privacy is where given a collection of individuals, each individual adds noise to their data before sending it to the statistical database itself => the protection is happening at a local level.

The question is, how much noise should we add?

First off, we saw in previous lessons the basic sum query is not differentially private.

Differential privacy always requires a form of randomness or noise added to the query to protect from things like a differencing attack.

Randomized response is this really amazing technique that is used in the social sciences when trying to learn about the high-level trends for some taboo behavior.

If you imagine that you are perhaps a sociologist and you want to be able to study how many people in a city have committed a certain crime, perhaps you’re going to study jaywalking. You sat down with 1,000 people and you wanted to ask each individual one hey, trust me, I’m not going to tell anyone on you. Have you ever jaywalked, perhaps in the last week? Well, there’s this trouble that people are going to be reluctant to divulge this information because it’s technically a crime in many locations. The sociologist are worried that the results are going to be skewed because some subset of population is going to answer dishonestly.

Thanks to this amazing technique where in a certain degree of randomness can be added to the process such that each individual is protected with what’s called plausible deniability.

Eg: instead of directly asking each person the question, the first thing that a sociologist will do is present the question and then say, I need for you to flip a coin two times without me seeing it and if the first coin flip is a heads, I want you to answer my yes or no question honestly. Did you jaywalk? But if the first coin flip is a tails, then I want you to answer this question according to the second coin flip.” => the idea here is that half the time, individuals are going to be answering honestly. The other half the time, they’re going to answer randomly with a 50–50 chance of saying yes or no.The interesting thing here is that if a person says, “Yes, I have jaywalked in the last week,” that person has a certain degree of plausible deniability (deny if he feels that he don’t want to share openly his answer)that they’re only answering it because of the coin flip.

So they have this natural level of protection, this localized differential privacy. They have this randomness applied to their specific data point that is local to them and that’s what in theory is able to give them the protection that gives them the freedom to answer honestly and to provide more accurate statistics at the end of the day. Perhaps the most extraordinary thing is that over the aggregate, over the entire population, the individual performing the study can then remove this random noise because as you can imagine, this process is that it takes the true statistic and averages it with a 50–50 coin flip. So in this particular case, let’s say that 70 % of people actually jaywalk, like in the real world. Then we know that when we perform our survey,60 %t of our results will answer yes. Let’s take this slowly. Since 70 % of people actually jaywalk,this means that roughly half our participants will say yes or no with a 50 % probability and the other will say yes or no with a 70 % probability. Thus when we average these two together, the results of our survey will be 60 %. This is incredible. This means that we can take the result of our noise statistic and back into the true distribution, the true result. We can say since 60 % of people reported that they jaywalk, then we know that the true answer is actually centered around 70 % and we can do all of this without actually knowing whether any individual person jaywalks. Pretty incredible. Now one thing we have to acknowledge here is that the added privacy comes at the cost of accuracy. Even though that we will on average still get the right statistics, we’re still averaging our initial result, 60 % with random noise. So if people happen to flip coins in a really unlikely way, we might accidentally think that 95 % of people are jaywalking. It’s only “in expectation” aka when we have an infinite number of samples, an infinite number of participants, that this noise disappears and we get the exact true distribution. Thus we have gained privacy but we’ve lost some accuracy, especially if we’re only sampling over a small number of people. This trend is true throughout the entire field of differential privacy.

Research in differential privacy can thus be grouped into two main themes:

The main goal of DP is to get the most accurate query results with the greatest amount of privacy, aka how can we minimize the amount of noise that we are adding,while maximizing the amount of privacy.
The second goal is derivative of this, which looks at who trusts or doesn’t trust each other in a real world situation. Because if you add noise to protect two people who do trust each other, that noise was wasted and your query was less accurate than necessary. But if you forget to add noise between two people who don’t trust each other, meaning a database curator and an individual, then you put one of them more at risk. So we want to minimize a noise accuracy tradeoff. One of the strategies there is to create flexible differential privacy strategies which fit with how people actually do and don’t trust each other in the real world. But enough on that.

#Project Demo Implement Local Differential Privacy

In this project, we’re going to implement simple local differential privacy using randomized response from scratch in Python.

Specifically, we’re actually going to implement in Python, the coin flipping example .

If you have a group people that you’re wishing to survey about a very taboo behavior, then we are given these interesting instructions as to how they should answer. So they flip a coin two times and if the first coin flip was heads, they should answer honestly. However, if he first coin flip was tails, they should answer according to a second coin flip. What this does, is this gives each person plausible deniability while simply averaging across all of their answers with a 50 percent probability of saying yes or no.Thus, if we sample across a 1000 people, we would carry this over 1000 people. We interrogate a 1000 people with this question, and around 60 percent of them answer, yes. Then we know that the true distribution is 70 percent because 70 percent, the true distribution has been averaged with 50 percent to get the the output of our statistical survey.

The query function we’re going to use is a mean function.Let’s say we have a database of size 100.

db, pdbs = create_db_and_parallels(100)

tensor([1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0,

0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,

1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1,

0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,

0, 1, 0, 0], dtype=torch.uint8)

The true result is the mean of the database.

true_result = torch.mean(db.float())

true_result

tensor(0.4700)

Since we generate our database randomly,with 50 percent probability of being one or zero, the mean tends to be around 0.5.

tensor(0.4700)/tensor(0.4500)/tensor(0.4400)/tensor(0.5200)

However, we want to noise our dataset using local differential privacy. Local differential privacy is all about adding noise to the data itself. So in our case, adding noise to the data means replacing some of these values with random values:

tensor([1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0,

0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,

1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1,

0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,

0, 1, 0, 0], dtype=torch.uint8)

If you remember, from this randomized response, like sometimes they answer honestly, so these would be the honest results in our example. Sometimes, they answer according to a coin flip, they answer randomly.

First thing we need to do is actually flip two coins a 100 times:

first_coin_flip = (torch.rand(len(db)) > 0.5).float() (we get a 50–50 chance of flipping heads or tails. In this case, 50% of times to get a 0.0 or 1.0 from our random number generator)

Our first coin flip for every single value in this database tensor:

tensor([0., 1., 1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1.,

0., 1., 1., 0., 0., 0., 1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 0., 0.,

1., 1., 1., 1., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0.,

1., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 1., 1., 1., 0.,

1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0.,

0., 1., 0., 0., 1., 1., 1., 1., 1., 0.])

Let’s do this and create a second coin flip. This first coin flip is going to determine whether we want to use the value it’s actually going to database, or whether we want to use the value that was randomly generated according to this second coin flipper.

tensor([0., 1., 1., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 1., 0., 0., 1.,

1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 0.,

1., 1., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0.,

1., 0., 0., 1., 0., 0., 1., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1.,

0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 1., 0.,

0., 0., 1., 1., 0., 1., 0., 0., 1., 1.])

We can do this to create our noisy database or synthetic database or augmented database. We’ll call it augmented in the following way: half the time, if the coin flip is one, then we want to use the database.So it was a heads answer honestly which we’ll call one heads. The nice thing here is, we can do this by simply just multiplying first coin flip by database, because this acts as a mask.The multiplying it times a one, will leave in wherever the database is and multiplying it times a zero will zero out any values in the database. Will return database times first coin flip, returns a one only for the places in the database where there actually was a one originally. sometimes we need to choose randomly.

db.float() * first_coin_flip

tensor([0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,

1., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 1., 0.,

1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 1., 0., 0., 1., 0., 1., 1., 1., 0.])

So if

((1-first_coin_flip))

tensor([0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,

1., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 1., 0.,

1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 1., 0., 0., 1., 0., 1., 1., 1., 0.])

here’s all the places where we want to actually choose randomly All of these ones that you see here, we want it to actually sample from the second coin flip.We can do this by simply sampling or multiplying times the second coin flip:

(1-first_coin_flip)*second_coin_flip

tensor([0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0.,

1., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0.,

0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1.,

0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0.,

0., 0., 1., 1., 0., 0., 0., 0., 0., 1.])

So here’s all the values that are being sampled randomly.

If we simply add these together, then we get a new augmented database, which is differential private:

augmented_database= db.float() * first_coin_flip +(1-first_coin_flip)*second_coin_flip

torch.mean(augmented_database.float())

tensor(0.4400)

=> we augment the database and we can just go ahead and use this. However, something else has happened here, half of our values are honest and half of our values are always going to have a mean or try to have a mean, that’s at 0.5.This means that it’s going to skew the output of our query towards 0.5.Because half the time,we’re using the values that are from the second coin flip,which has a 50/50 chance of being positive or being heads or tails.Let’s just say the database had,on average was 70 % of the time that people said true for whatever the taboo behavior was.Let’s say that 70 % of these were true,so that the mean of this would be 0.7.That would mean that if we took half the values from here and half the values from this randomly generated 50/50 distribution,then the mean of them would shift from this where the mean was 0.7 and this where the mean was 0.5, would be halves between them. The mean would then be 0.6 even though the original distribution was 0.7.

Means is almost like this query is now skewed as a result of the noise. Skewing was not all we were after, what we wanted to do was just give each individual person plausible deniability.

In conclusion we have to do one final step here, which is deskew the result.

Let’s say for example, there’s a distribution here where there’s a certain averages here and we’ll say it’s 70 %. The average of this coin flip, is always going to be 50 %. If we are half from here and half from here is 60 %, then we can just do that in reverse. So that means that if the output of our augmented database gives us a mean of 60 %, well, then, we know that it’s really trying to tell us 70 % because half of our data is just 50/50 random noise. Let’s try to deskew this result by:

dp_result=torch.mean(augmented_database.float())*2–0.5

tensor(0.3800)

This is the true output of our database. Because I have removed the skewed average that was a result of this local differential privacy or this second coin flips. So this is actually the output of our augmented result, so our augmented differentially private result. We’ll call it db_result.

We can package all this as a single query and see what was in our assignments. So in our assignment, we needed to return the augmented result,that differentially private result. We also want to return the true result, because the notion of this and what we wanted to learn from this lesson is actually to understand how these results tend to change:

def query(db):

true_result=torch.mean(db.float())

first_coin_flip = (torch.rand(len(db)) > 0.5).float()

second_coin_flip = (torch.rand(len(db)) > 0.5).float()

augmented_database= db.float() * first_coin_flip +(1-first_coin_flip)*second_coin_flip

dp_result=torch.mean(augmented_database.float())*2–0.5

return dp_result,true_result

query(db)

(tensor(0.5400), tensor(0.5200))

What we want to do here, is work on databases at different sizes. The assignment here were supposed to do a database of size 10. Then look at the private result, true results. We’ll print this out here so it’s easy to see.What we’re going to see when we compare that noise really throws things off sometimes

db,pdbs=create_db_and_parallels(10)

private_result,true_result=query(db)

print(“Whith noise: “+str(private_result))

print(“Without noise: “+str(true_result))

Whith noise: tensor(-0.1000)

Without noise: tensor(0.5000)

but let’s use a bigger dataset:

db,pdbs=create_db_and_parallels(100)

private_result,true_result=query(db)

print(“Whith noise: “+str(private_result))

print(“Without noise: “+str(true_result))

Whith noise:tensor(0.6000)

Without noise:tensor(0.5400)

Let’s use an even bigger dataset:

db,pdbs=create_db_and_parallels(1000 )

private_result,true_result=query(db)

print(“Whith noise: “+str(private_result))

print(“Without noise: “+str(true_result))

Whith noise:tensor(0.4840)

Without noise:tensor(0.4820)

Then, even bigger dataset.

db,pdbs=create_db_and_parallels(10000)

private_result,true_result=query(db)

print(“Whith noise: “+str(private_result))

print(“Without noise: “+str(true_result))

Whith noise:tensor(0.4834)

Without noise:tensor(0.5016)

Local differential privacy and differential privacy in general:

Whenever we’re adding noise to a distribution, were corrupting it. So the statistics that the queries that we’re doing are going to be sensitive to this noise. However, the more data points that we have, the more this noise will tend to average out. It will tend to not affect the output of the query because on average,across a large number of people,sometimes the noise is making the result higher than it should be or lower than the result should be. But on average, it’s actually still centered around the same mean of the true data distribution. In particular, a general rule of thumb, which is local differential privacy is very data hungry. In order to be able to noise the dataset, you’re adding a ton of noise, we’re adding noise to every single value in the dataset. So when we have 10,000 entries, we’re adding 10,000 bits of noise. I guess like with 50 % probability. So it probably more like 5,000 bits of noise, but that’s still a lot of noise.

Global differential privacy, in contrast, only adds noise to the output of the query. We will find that this tends to be quite a bit more accurate and a little less data hungry than local differential privacy. If you want to implement local differential privacy and protect data at the data level, protect the dataset itself. Then, you’ll want to use local differential privacy, that we want to make sure you have a really large dataset. Whereas, if you want to use a less data hungry algorithms and maybe your dataset is smaller but you still need to protect it, then it’s better to lean towards global differential privacy. In general, I think personally leaning towards global differential privacy seems to be where a lot of the field is going although obviously the local db has its advantages. The event the local db is that in theory you could publish some of these datasets. So you don’t have to trust someone to lower trust settings

Just understand this notion of this trade off between small datasets and and large datasets. Meaning that your noise creates vastly different outputs of the queries, versus only slightly different outputs of the queries when you have a large enough dataset to average the noise over.

#Project Demo Varying the Amount of Noise

In this project, we’re going to take the code that we wrote in the last section and we’re going to add one extra feature. That extra feature is that we’re going to make it so that we can bias

the first coin flip to be arbitrarily more likely or less likely to be heads. The thing that we’re going to need to change is how we actually reskew the output of our mean query later.

So the first thing we’re going to do is go ahead and copy this query:

def query(db):

true_result=torch.mean(db.float())

first_coin_flip = (torch.rand(len(db)) > 0.5).float()

second_coin_flip = (torch.rand(len(db)) > 0.5).float()

augmented_database= db.float() * first_coin_flip +(1-first_coin_flip)*second_coin_flip

dp_result=torch.mean(augmented_database.float())*2–0.5

return dp_result,true_result

We have this first coin flip and we determine the likelihood of this coin flip to be a heads or tails based on this threshold. The torch code here actually,these outputs, a random distribution of numbers between zero and one:

torch.rand(len(db))

If we look at here, it’s a bunch of random numbers, just uniformly random between zero and one:

tensor([0.8328, 0.7786, 0.8227, …, 0.7056, 0.5253, 0.2505])

We threshold it by: (torch.rand(len(db)) > 0.5).float() that say 50 % of them on average are above the threshold and 50 % are below.

What we want to do in this section (torch.rand(len(db)) > 0.5) is we want to make this adjustable.We want to make this a noise parameter which actually sets the likelihood that the coin flip will be a heads:

def query(db,noise=0.2):

true_result=torch.mean(db.float())

first_coin_flip = (torch.rand(len(db)) > noise).float()

second_coin_flip = (torch.rand(len(db)) > 0.5).float()

augmented_database= db.float() * first_coin_flip +(1-first_coin_flip)*second_coin_flip

dp_result=torch.mean(augmented_database.float())*2–0.5

return dp_result,true_result

The interesting thing here is all the rest of the pieces are really the same.So the first coin flip, we’ve modified with this a little bit.The second coin flip is still a random distribution between a 50/50 coin flip,just sometimes we’re more or less likely to actually rely on the second coin flip depending on the likelihood that our first coin flip is a heads or tails.Augmented database is still created in the same way because it’s still based on these two coin flips. The part that’s different however is how we reskew the results for this mean query later.

Because the nature of this average is based on the idea that we are reskewing the data set to have the same average, to have the same mean as the original database. We take whatever the actual database was. So in one example earlier, we mentioned that perhaps 70 % of people on average answered true to whatever the question was.We take whatever this distribution is so that we’ll say it’s centered around 0.7 and we average it with a distribution that is centered around 0.5, then that returns a distribution if they’re weighted evenly that’s averaged around 0.6. So we had a true distribution mean, let’s just say it was 0.7.Then we say that our noise distribution, we’ll say it’s 0.5 because it’s a random coin flip.

true_dist_mean= 0.7 # 70% of people said yes

nois_dist_mean= 0.5 #50/50 coin flip

this means that the augmented database mean is going to be :

augmented_database_mean= (true_dist_mean+nois_dist_mean)/2

augmented_database_mean

output:tensort: 0.6

However, we want to get rid of this random noise on average while still keeping the noise that we actually put on each individual entry. More importantly, if we are more or less likely to choose as distribution,we want to make sure that however we reskew ,we can properly deskew it according to this noise parameter(noise=0.2). How we really do here is we actually sort of run this mean in reverse.We’re unaveraging the output of the query.

First, we’re going to create a sort of in which it’s actually our skewed query.

Our skewed result,which equals augmented database dot float dot mean.

# try this project here!

#def query(db,noise=0.2):

noise=0.2

true_result=torch.mean(db.float())

first_coin_flip = (torch.rand(len(db)) > noise).float()

second_coin_flip = (torch.rand(len(db)) > 0.5).float()

augmented_database= db.float() * first_coin_flip +(1-first_coin_flip)*second_coin_flip

sk_result=augmented_database.float().mean()

# return dp_result,true_result

Sk_result

output:tensor(0.4800)

if we say our noise parameter

noise=0.5# 50/50 coin flip first coin flip,which meant that 50 percent of the time,

we use the true distribution with a mean of 0.7:

true_dist_mean= 0.7 # 70% of people said yes

nois_dist_mean= 0.5 #50/50 coin flip in our original example,this was 50/50 first coin flip,which meant that 50 percent of the time,we use the true distribution with a mean of 0.7,

and 50 percent of the time,we use a distribution that also had 0.5.

Another way of thinking about this is that our true distribution mean was being multiplied by noise.So half of the time, we’re using this distribution,and the other half of the time,we were using noise distribution mean one minus noise.

augmented_database_mean= (true_dist_mean* noise)+(nois_dist_mean*(1-noise))

Augmented_database_mean

output:0.6

How do we reverse this? Well, basically we can do simple algebra and pull out what the true distribution mean was by removing all these other terms.For de-using all these different terms. It’s going to be a multiplications attraction component because we want to get

out this value or at least deskew this value according to these others:

We take our skewed result,and we say our final or augments result or private result is going

to equal our skewed result divided by our noise minus 0.5,and the 0.5 is this 0.5 right here.

Then, we’re going to multiply this times noise divided by one minus noise.The deal that’s going on right here is what we’re basically deskewing these results so that

our private result has its mean adjusted according to this noise parameter.

So this noise parameter was at 0.5,then the skewed result,the distance between a skewed result and the private result is basically this is halfway between the true mean of distribution, and the mean of the distribution of 50/50 coin flip. So notice that 0.5. So 0.5 minus 0.4929, 0.5 minus 0.4858. This is roughly half of this, so we’re doubling the distance because there was a 50/50 average. So as we can see, this is the correct algebraic deskewing formula.

Then we return to the same things we did before.Okay. Now, we want to do a similar experiment to what we did last time.

def query(db, noise=0.2):

true_result = torch.mean(db.float())

first_coin_flip = (torch.rand(len(db)) > noise).float()

second_coin_flip = (torch.rand(len(db)) > 0.5).float()

augmented_database = db.float() * first_coin_flip + (1 — first_coin_flip) * second_coin_flip

sk_result = augmented_database.float().mean()

private_result = ((sk_result / noise) — 0.5) * noise / (1 — noise)

return private_result, true_resul

Experiment 2 : instead of varying the size of the database,we want to vary the amount of noise that we’re adding to the database. So we’re going to keep the database with size of 100.Then, we’re going to have this noise parameter.

Let’s start at like 0.1,

db, pdbs = create_db_and_parallels(100)

private_result, true_result = query(db, noise=0.1)

print(“With Noise:” + str(private_result))

print(“Without Noise:” + str(true_result))

With Noise:tensor(0.5667)

Without Noise:tensor(0.5600)

it’s relatively low noise 0.2,

db, pdbs = create_db_and_parallels(100)

private_result, true_result = query(db, noise=0.2)

print(“With Noise:” + str(private_result))

print(“Without Noise:” + str(true_result))

Whith noise:tensor(0.4625)

Without noise:tensor(0.4400)

db, pdbs = create_db_and_parallels(100)

private_result, true_result = query(db, noise=0.4)

print(“With Noise:” + str(private_result))

print(“Without Noise:” + str(true_result))

Whith noise:tensor(0.4333)

Without noise:tensor(0.5500)

and we’re just going to keep doubling and see what happens

db, pdbs = create_db_and_parallels(100)

private_result, true_result = query(db, noise=0.8)

print(“With Noise:” + str(private_result))

print(“Without Noise:” + str(true_result))

Whith noise:tensor(1.1500)

Without noise:tensor(0.4600)

As we increase the amount of noise, the difference between on average starts getting quite a bit more.

But if we counter this with a large dataset,then they come back together.

db, pdbs = create_db_and_parallels(10000)

private_result, true_result = query(db, noise=0.8)

print(“With Noise:” + str(private_result))

print(“Without Noise:” + str(true_result))

With Noise:tensor(0.5195)

Without Noise:tensor(0.5056)

The size of the data set allows you to add more noise or more privacy protection to the individuals who are inside the dataset. The counter-intuitive thing here is that the more private data you have access to, the easier it is to protect the privacy of the people who were involved.The larger this corpus of dataset is, the more noise you can add while still getting an accurate result.

Now, in society, this is actually probably even more counter-intuitive because people think of preserving privacy is as giving statistician’s access to less and less data, but in fact, with differential privacy, the opposite is true. Because the intuition behind differential privacy is about saying we want to learn about an aggregation over a large corpus. We want to learn something that is common about many different people.

So one thing might be, let’s say you’re performing statistical analysis of medical records.

Let’s say you’re going to identify tumors inside of MRI scans. You have a collection of say 1,000 images that have the tumors annotated so you’re trying to learn how to detect tumors inside of individuals. When you’re performing a statistical analysis, you’re not actually interested in whether any one of these people has a tumor. Now, you’re not trying to violate any particular person’s privacy, instead, you’re trying to do a statistical study to understand what do tumors in humans look like. You’re actually trying to go after information that is fundamentally public but just happens to be buried inside of individual private data points.

More importantly, the way in which this technology works is that the differential privacy is a very complex kind of filter and the way that the differentially private filter works is that it looks for information that is consistent across multiple different individuals. It tries to filter out perfect differential privacy so no information leakage would, in theory, be able to block out any information that is unique about participants in your dataset and only let through information that is consistent across multiple different people,aka allowing you to learn what do tumors in humans look like without learning whether any individual person that you were studying actually had a tumor. That’s the nature of this kind of a filter that differential privacy allows us to have.But it’s only allowed to look for repeating statistical information inside the dataset.So if you have a really small data set,the odds of it finding the same statistical pattern twice are actually pretty low because you only have a few people to look at. If you have a dataset of five images, every image is totally unique, everything’s going to look like it’s totally private information, differential privacy will have to get rid of all the data, it won’t let anything through, even a small amount of noise will totally corrupt the output of your query.

However, if you have say a million images or maybe, in this case, we have 10,000 entries:

db, pdbs = create_db_and_parallels(10000)

private_result, true_result = query(db, noise=0.8)

print(“With Noise:” + str(private_result))

print(“Without Noise:” + str(true_result))

With Noise:tensor(0.5195)

Without Noise:tensor(0.5056)

then it becomes a lot easier to learn about general characteristics in the dataset because you have more data points to look at and compare with each other to look for a similar statistical information.

What are differentially private mechanisms? It’s a mechanism that study a dataset and filter out any information that is unique to individual points in your dataset and try to let through information that is consistent across multiple different people in your dataset. This is why one of the big takeaways from this project in this lesson is really that the larger the corpus of information that you can work with, the easier it is for you to protect privacy because it’s easier for your algorithm to detect that some statistical information is happening in more than one person, and thus is not private or unique or sensitive to that person because it’s a general characteristic of humans more and more generally.

#The Formal Definition of Differential Privacy

We’re going to create a database with our query and then analyze

it with the formal definition of DP. We’re going to add noise to the output of our function and we have two different kinds of noise that we can add, Laplacian noise or Gaussian noise :

(The Laplace distribution is similar to the Gaussian/normal distribution, but is sharper at the peak and has fatter tails.Also, the Laplacian noise is more suited to our applications in differential privacy as it always has a ‘0’ value for the delta parameter).

In order to know how much noise we should add,we will appear to the formalized definition of DP. This is the definition proposed by Cynthia Dwork.It’s the e equals mc squared of differential privacy.It’s most important formula in the field and it doesn’t create differential privacy necessarily, it’s not a method of adding noise per say. Instead, it’s a constraint so that you can analyze a query with noise and know whether or not this query and noise is leaking too much information. In particular, we have two different measures called Epsilon and Delta, and these compose a threshold for leakage. Now, let me unpack this a little bit for you what this exact inequality is. So the previous method of adding noise that we just worked with in the last session was called local differential privacy because we added noise to each datapoint individually. This is necessary for some situations where in the data is so sensitive

that individuals do not trust them that noise will be added later. However, it comes at a very high costs in terms of accuracy. So if you’ll remember back to how we actually constructed the coin flipping example, you’ll remember that before an individual gave us their answer, you would query an individual and you would say, hey, are you performing some sort of let’s say activity? Before they gave you they’re true or false value, they would actually flip their coin at that point and so that means that they were adding noise themselves to their individual datapoint before submitting it to you and it’s this local term refers to the fact that they are locally adding noise to the data before it gets sent to you.

Now, if you’re familiar with the intuitions of say dataset anonymization, even though dataset anonymization is a broken technique and you should use it, it’s probably the closest parallel to local differential privacy to the extent that it is augmenting datapoints individually with the intent of trying to build release.

But again, don’t do dataset anonymization,don’t advocate people to do dataset anonymization because it’s a fundamentally broken idea. However, this leads us directly into the other kind or the other class of differential private algorithms called global differential privacy.

Global differential privacy instead of adding noise to individual datapoints, applies noise to the output of a function on datapoints. The advantage in general, and is a little bit of a sweeping statement, but the advantage in general is that you can often add a lot less noise and get a much more accurate result if you wait to add the noise until after you’ve computed a function, and the reason for this is that many functions actually reduce the sensitivity involved.

So for example, here we have a query or a database, in differential privacy would add noise to individual datapoints for the sake of protecting when we were simulating the coin flipping example.

db, pdbs = create_db_and_parallels(100)

def query(db):

return torch.sum(db.float()+noise)

def M(db):

query(db) + noise

query(db)

Have a global differential privacy would add noise out here to the output of a function:

db, pdbs = create_db_and_parallels(100)

def query(db):

return torch.sum(db.float())+noise

def M(db):

query(db) + noise

query(db)

Now, this doesn’t have to be just a single function, this can be a big long chain of functions,

it could be a sum and a threshold and then another some and then multiplication or any any number of functions that we wanted to chain together and we can add noise in the middle or at the end wherever we wanted to do this.

As a general rule of thumb, as you process data more and more, it is quite common for sensitivity to go down. So a good place to start, if you’re trying to figure out where you want to add noise in your system and your pipeline, is to lean more towards doing it as late in

the chain as possible because the later you go, the more individuals you have likely aggregated over, the more processing you have done, the more likely you will have

done to do thresholds or squishing functions like Sigmoids or ReLu or any other kinds of post-processing on your data.

The better the chances are that you’ll do things that actually reduce some sensitivity and actually end up reducing the amount of noise that you have to add giving you a more accurate result with less privacy leakage.

So if the data is so sensitive, the people aren’t going to give it to you then people tend to lean more towards local differential privacy because the individual data owners are just sort of so scared that they want to protect their data before they handed over to what’s called a trusted curator, who is the party that is generally referred to as the one who’s actually performing differential privacy.

That’s the reason that most people use local DP, use local differential privacy whereas people who use global differential privacy when they’re more interested in saying:

“Hey, I really need the output of this to be accurate while still having the same level of privacy”.

So if there’s a trade-off between how much the data owner is willing to trust the person performing differential privacy here. So if you can facilitate a setup, where differential privacy is being performed over a large dataset and they can trust the you’re performing differential privacy correctly, would strongly advocate for you to lean towards global differential privacy because it can be a much more accurate method of stripping out private information.

How do we actually measure how much privacy is being leaked inside of a differentially private algorithm?

Differential privacy itself,the most well-known, has been proposed by Cynthia Dwork,differential privacy typically build on top of this one for one purpose or another purpose:

Let’s reread this a bit more intuitively: A randomized algorithm M, M being a randomized algorithm :

def M(db):

query(db) + noise

This would be a globally differential private randomized algorithm. It’s some sort of function on a database with some sort of noise added to it. Now, this noise could have been inside the db or applied to the db which in case it would have been locally differentially private. We have we don’t know what algorithm is S, we just know that it has the ability to query a database and it is randomized in some way.

That’s what we know about M and it has an output domain of Nx.Meaning it could be a histogram over certain entries in database. So, it’s discretized in this particular case and we’re saying that this is epsilon delta differentially private if for all the potential things that it could predict. So for all for things S in the range of M and for all database pairs parallel of databases such that x minus y is less than equal to 1. Actually, these are these are histograms over pair databases.

And Mx counts how many times each thing happened in the database, how many times it was a database full of marbles.

I don’t know, in each each entry had a marble of a specific color. This might count the number of red marbles and blue marbles, the number of yellow marbles right in which case,

this would be three natural numbers and the size of x would be three and of course their natural numbers because we’re counting individual things. Its discretized, the differential privacy definition is post in the form of a histogram over a database. we’re saying there are two histograms right, and the max distance between these two histograms is one.

Meaning that they only differ in one entry, and they are parallel databases.

Let’s just walk through this one more time just to make sure that you understand this setup. This definition. A randomized algorithm m with a domain Nx meaning that is it’s their natural numbers, the x and actually earlier in this paper, it was identified that this was referring specifically to a histogram over database is epsilon delta differentially private if, for all the things that M could predict, all the potential outputs of m for all and for all the potential inputs that are parallel databases right and this is the part of the thing.

The setup is two histograms, one is the full database, one is database with one entry missing and this constraint is true:

So, what is this constraint?

is actually taking our mechanism and running it over this histogram.

is doing the same thing over any every parallel database with one entry missing, this threshold has to be true for all databases.

What is the distance between the distribution over all the things in the database

,the probability distribution over all things database versus the probability distribution over all the things in the database minus one entry

So,

random distribution over things in the database, objects in the database.

Random distribution over objects in the database with one entry missing.

The question that is that the core focus of differential privacy is how different are these two distributions?

How different is the output of my mechanism that the prediction of my mechanism, when I remove an entry from the database? How much does it change from here to here? And the measurement of the maximum amount that these two distributions are

different is measured by two parameters: Epsilon 𝜺 and delta.𝝳

constrains how different these distributions are is an sort of the primary constraint we might say, and some algorithms actually only use 𝜺(epsilon).

if 𝜺was zero and these distributions (

)are identical then

is one,and these distributions are identical.

If 𝜺 zero 𝝳 zero would be perfect privacy. if this constraint was satisfied at epsilon zero delta zero, then we have no privacy leakage with M on x

However, let’s say epsilon was 1, that allows some privacy leakage.But, something that is very important to take away is that something can satisfy differential privacy.

Delta 𝝳 is the probability, which is usually very small, The probability that you will accidentally leak more information than epsilon claims that you will leak.Delta is often you know,0.00001 or something like that or it’s zero, it’s usually a very very small probability when it’s non-zero and it’s basically saying hey, most of the time, you’re able to keep everything underneath this amount of leakage. These distributions are going to be at least as close most of the time,but only this probability will they actually not be, and this is basically saying that, if your query is for just the right thing with just the right random noise some probability,you will accidentally leak more information.

This constraint of differential privacy and so as you might imagine, a tremendous amount of effort goes into developing really good random algorithms.

This can be satisfied in a way that is usefull, where we get the most accurate output of our queries possible with the lowest epsilon and delta possible and there are also improvements or modifications to this algorithm or this constraint that seek to modify it in various ways which you can observe for literature and and will lead to some pointers to at the end of the course.

#Introducing Local and Global Differential Privacy #Lesson 5

Secure and Private AI Scholarship Challenge from Facebook

Secure and Private AI by Facebook Artificial Intelligence

Written by calincan mircea