# What is ‘ Differential Privacy’ and why should you care

Apr 21 · 4 min read

We live in times where we aggregate a lot of user-generated data sets. We use these data sets to learn new insights and build solutions for all kinds of problems. Google uses user location information, tracked through smartphones, to infer how busy it is at certain public places. We use large scale patient data to get insights into certain disease trajectories, helping to cure future patients more effectively. Data-driven business models are being developed to deliver new products or services to customers in virtually every industry. However, there is a privacy issue at play here. How can we make sure that no personal information leaks in the final output of these algorithms? How do we evaluate ‘how good’ the privacy-preserving measures are implemented? This is differential privacy. It is a mathematical definition on the privacy of personal information used in big data algorithms. The definition enables us to quantify exactly how anonymous data is in a data set.

# Explain It Like I’m 5

Your parents know which candy you like. Let’s assume you want nobody to know which candy you exactly like or dislike. The ability to keep this type of information a secret is called privacy. Now imagine you go to school and the teacher wants to know whether there are children in your class that does not like gummy bears. She does not want to know if you specifically are one of these kids that don’t like gummy bears. It is sufficient if she can tell that there are maybe 6 or 7 children who don’t like gummy bears. This type of privacy is called differential privacy.

# Mathematical Privacy Guarantees

Consider the following:

• A is an algorithm that is implemented in such a way that it offers differential privacy;

Differential privacy offers us the mathematical guarantee that anyone, seeing the results O or O’, will essentially learn exactly the same things about John’s private information. Or in other words, we should not be able to infer whether John’s private information has been used in the computation. We should learn exactly the same things from O as we do from O’, namely the general information that is contained in the data set.

However, differential privacy does not guarantee to keep John’s general information private. Information that is generally available is not guaranteed to be kept private in a differentially private algorithm. Therefore, when designing a differentially private algorithm, it is important to clearly understand what is general information (known publicly) and what exactly is private information.

# Why we want Differential Privacy

Differential privacy has roughly four interesting properties that allow us to reason and protect the privacy of personal information in large data sets [1].

1. It allows us to quantify the loss of privacy. It enables us to compare different data processing algorithms. We can control exactly how much privacy we want to lose. We have a way to choose a trade-off between the accuracy of the general information contained in the output and privacy loss.

# Conclusion

We went from an intuitive understanding of privacy to a more formal way of defining differential privacy and why this is useful to have. Since this is a LinkedIn article, I won’t bother with mentioning the actual mathematical notation of differential privacy. It is left as an exercise to the reader in the references. Whenever we design algorithms that compute results on large data sets, composed of personal information, we need to ensure no personal information is leaked. Differential privacy is a new and upcoming field, it will be interesting to see how it will fit and develop over time.

# References

[1] The algorithmic foundations of differential privacy. Cynthia Dwork and Aaron Roth. Foundations and Trends in Theoretical Computer Science, 9(3 4):211–407, 2014.

[2] Differential Privacy: A Primer for Non-technical Audience. Kobbi Nissim et al. February 14, 2018.

Written by

Written by