Differential Privacy for Deep Learning

You might be wondering, if you have read the previous articles on differential privacy, what does all this querying have to do with deep learning. These techniques actually form the core principles, for how differential privacy works in the context of deep learning.

We have previously defined a perfect query, which is a query that returns the same value even when a person is removed from the database before performing it. As this means that no person's information is contributing to the output of the query, and thus maintaining privacy for that individual.

So now, in a deep learning context, instead of talking about querying a database, we will talk about training a model. And our new perfect query definition will be for a perfect AI model.

Training a model on a dataset should return the same model even if we remove any person from the training dataset.

The deep learning context adds 2 new complex points that were not in the simple database context, which are:

1- Do we always know where people are referenced in a database.

In the case of a database, it was simple, as every row corresponded to a person, so we could easily remove a person by removing a row. But in a dataset, let's stay we are training a sentiment classifier for a movie review, the dataset would be a bunch of text, and we would have no idea, who this text belongs to. Therefore, in some cases this might be challenging.

2- Neural Models rarely ever train in the same location twice, even when trained on the same dataset.

Hence, if we train a neural model on the same dataset twice, the model won't reach the same exact state twice, as there already is a randomised factor in the training process

Ultimately, privacy preserving technology is about protecting data owners from individuals or parties they don't trust. Thus, we want to add as much noise as necessary to protect these individuals, but we have to be carful as adding excess noise will fatally affect the model's accuracy, while not adding enough might place an individual at a privacy risk.

Accordingly, when discussing privacy techniques, we need to make sure who are the parties involved, and whom do we and don't trust. So that we could choose the best technique.

In the next article, we are going to discuss how to go about the two newly introduced complexities, and how to achieve differential privacy in a deep learning context.

Note

I am writing this article in part with Udacity’s secure and private AI Scholarship Challenge, as a way to share what I have learned so far.

#60daysofudacity #secureandprivateai

--

--

Aisha Elbadrawy
Secure and Private AI Writing Challenge

I am a computer science graduate, who is passionate about problem solving, learning and education. Interested in Software Development and Machine Learning.