Differential Privacy for the Rest of Us

by Morten Dahl and Joseph Dureau @ Snips

Morten Dahl
Snips Blog
8 min readJul 29, 2016

--

Awareness, industry standards, as well as legislation around privacy are making steady progress, turning privacy into a strategic and ethical positioning. Large companies, such as Google and Apple, have taken into account these opportunities and constraints, and are setting high standards in the field.

Although there are some pragmatic solutions like storing limited amounts of data, removing unique identifiers, periodically erasing databases, etc., we would all prefer privacy to be based on mathematical properties.

Differential privacy (or DP for short) is an interesting solution to this problem. The idea behind it is that you want to add noise to the user data, such that you cannot learn anything at the individual level, while canceling out the noise at the aggregated population level.

Local vs Global Differential Privacy

Differential privacy comes in two flavour: local and global.

To explain the local model, let us consider a typical setup where we have a set of users with inputs at the bottom, along with a data scientist at the top who would like to learn an aggregation on these inputs. In the middle we have an aggregator that receives the inputs from the users, does the actual computation, and sends the output to the data scientist.

In terms of privacy, what we are after is that no one should be able to learn “too much” about each individual user, and as such a bit of noise is added by each user before sending it to the aggregator. A simple way this can be achieved is by asking users to follow this mechanism:

  1. Flip a coin
  2. If tail, respond truthfully
  3. If heads, flip another coin and respond yes if tails, no if heads

But there are more elaborate mechanisms such as the one recently introduced for analytics by Apple, and the RAPPOR system from Google (which is what we will benchmark against).

One thing to notice is that the small bits of noise added by each user accumulate. As a consequence, the output received by the data scientist can become very noisy, meaning he also needs a lot more data to get a clear signal. This is why we often hear that differential privacy is “privacy in the land of plenty”.

To tackle this problem there is an alternative called the global model, in which the aggregator adds noise instead of the users. Doing so can still give the desired level of privacy towards the data scientist, yet now the users have to trust the aggregator to be an angel. In particular, they have to trust that he will not meet with the data scientist late at night in a dark alley, in order to reveal the inputs without the noise in exchange for a sum of cash.

While this may at first seem like the local model is the clear winner, the fact that so much more data is needed makes it impractical to use in cases where there are simply not enough users that contribute their data. Even from the perspective of a larger company, relying on a mechanism that involves less noise can be key when dealing with sophisticated machine learning problems.

Fortunately, modern cryptography provides a way out of blindly trusting the aggregator: secure multi-party computation, or MPC for short. Using various techniques such as secret sharing and homomorphic encryption, it turns out that we can actually build a cryptographic system that provides as much privacy as the local model, but assumes much smaller leaps of faith!

The precise details of how we made this work for our setting will be the topic of a follow-up blog post, but the general idea is that there is a way to compute on encrypted values without knowing the decryption key. As such, the single trusted aggregator can be replaced by a group of aggregators, of which a certain fraction may be corrupt without harming the privacy of the users.

In this system, the users encrypt their inputs and send the encryptions to the aggregators. The aggregators then compute the aggregation together without decrypting (they can’t because they don’t know the decryption key) and add noise once at the end. Finally, they enter a decryption process with the data scientist such that only the latter learns the output.

Unsurprisingly, there is a penalty in using cryptographic tools instead of the simpler DP mechanisms, and MPC is by no means a silver bullet. The rest of this blog post is about making sure that the price we pay for cryptography is justified by what we can get in return in terms of accuracy and utility when switching from the local to the global model.

RAPPOR vs. Global DP

In this experiment we ask the following question: on a given application, how many samples do you need to yield the same utility using a global DP mechanism while granting the same privacy as what you would get from a local mechanism? Due to its reputation and open implementation, we will benchmark against RAPPOR, which implements local DP.

Our experiment involves a classic example included in the original RAPPOR article: the universe consists of a list of 100 candidate strings whose appearance probability is exponentially distributed, as well as 100 completely unlikely candidate strings. Can we classify which ones have a strictly positive probability of appearing, once noise has been added to ensure differential privacy?

We ran the analysis for a number of users ranging from 100 to 1,000,000 with each sending one sample. Privacy, as measured by the epsilon parameter according to the definition of DP, is set to ln(3). More specifically, p is set to 0.5, q is 0.75, f is 0, the size of the Bloom filters is 32 bits, one hashing function is used, with 64 cohorts. It is one of the standard parameterisations used in the article, corresponding to what is called one-time basic RAPPOR. For global DP, we opted for the canonical Laplace mechanism where noise distributed according to Lapl(sensitivity / epsilon) is added to every bin of the histogram. If each sample comes from a different user, the sensitivity of the histogram to each user’s contribution is 1.

For every number of samples, observations are sampled following the underlying distribution: 100 exponentially distributed candidates and 100 candidates with probability 0, yielding the original sample distribution. These samples are then processed through the RAPPOR mechanism, generating the RAPPOR distribution. The original sample distribution is processed through the Laplace mechanism, giving the Laplace distribution.

RAPPOR includes a step that determines whether each candidate has a significantly non-zero probability of appearing. A similar operation is performed on the Laplace distribution by checking whether estimated frequencies under the Laplace distribution are over the 95th quantile of the Laplace noise that was added. The classification performances under each mechanism are shown below:

We see that under the conditions of this experiment, the Laplace mechanism with 1,000 samples offers a similar prediction performance than RAPPOR with 1,000,000 samples, while guaranteeing the same level of differential privacy. This means DP is accessible to startups that have a few thousand users, as much as large companies with hundreds of millions!

Applying DP to Machine Learning

The comparison with RAPPOR is useful as a reference to the current state of the art in the industry. But we wanted to push the analysis a bit further, exploring the amount of users we would need in a practical situation of ours, if we chose to opt for global DP.

We are currently working on an assistant that helps the user whenever he needs to go somewhere. It analyses the places he has interacted with (visited, searched for, mentioned in a messages) to provide contextual recommendations of places he could be interested in.

Services like Foursquare are quite good at identifying popular spots and new venues that are trending. But they lack a deep understanding of the user’s context and preferences, due to the limited amount of personal data they have. Although we can rely on Foursquare to get suggestions of new places, we would rather have a model that determines the optimal parameters to pass to Foursquare, such as the type of places the user likes, or the distance he is willing to travel.

This problem can be tackled with a Bayesian network that takes into consideration the popularity of given place categories depending on whether it’s a weekend or not, the period of the day (morning, lunch, afternoon, evening, night), and the presence of adverse weather conditions (heavy clouds, rain, wind, snow).

In each situation, we are interested in the ability to predict the three most popular categories, out of 8 high-level place categories (food, shop, outdoors, arts & entertainment, etc). Once a category is picked, a similar model could then determine the distance the user is willing to travel.

Below we plotted the accuracy in getting the exact top-three categories in each context, taking into account that each user sends the context around places he visited, and that we aim for ln(3) differential privacy.

We see here, again, that while more data enables finer models, having a few thousand users already enables DP to be used in machine learning applications!

Conclusion

Our intuition was that there are more benefits in using the global model rather than the local model when we have a small user base. This has been validated on the canonical example on which RAPPOR was benchmarked, where similar utility can be achieved with the global model with 1000 times less users.

Furthermore, we showed that standard machine learning techniques, applied to useful problems, perform well on less than 100,000 users while preserving their privacy.

One question that remains is how much performance penalty we incur when using cryptography in mobile settings. While there are off-the-shelves MPC solutions, none of them work well for our use case, and we had to instead develop a tailor-made solution fitting more closely with the overall application. This is something we will cover in a future blog post!

If you enjoyed this article, it would really help if you hit recommend below and try our apps!

You can also follow us on Twitter at @mortendahlcs, @jodureau, and @snips.

If you want to work on Machine Learning and Privacy, checkout our jobs page!

--

--