# What is ‘ Differential Privacy’ and why should you care

Apr 21, 2020 · 4 min read

We live in times where we aggregate a lot of user-generated data sets. We use these data sets to learn new insights and build solutions for all kinds of problems. Google uses user location information, tracked through smartphones, to infer how busy it is at certain public places. We use large scale patient data to get insights into certain disease trajectories, helping to cure future patients more effectively. Data-driven business models are being developed to deliver new products or services to customers in virtually every industry. However, there is a privacy issue at play here. How can we make sure that no personal information leaks in the final output of these algorithms? How do we evaluate ‘how good’ the privacy-preserving measures are implemented? This is differential privacy. It is a mathematical definition on the privacy of personal information used in big data algorithms. The definition enables us to quantify exactly how anonymous data is in a data set.

# Explain It Like I’m 5

Your parents know which candy you like. Let’s assume you want nobody to know which candy you exactly like or dislike. The ability to keep this type of information a secret is called privacy. Now imagine you go to school and the teacher wants to know whether there are children in your class that does not like gummy bears. She does not want to know if you specifically are one of these kids that don’t like gummy bears. It is sufficient if she can tell that there are maybe 6 or 7 children who don’t like gummy bears. This type of privacy is called differential privacy.

# Mathematical Privacy Guarantees

Consider the following:

Differential privacy offers us the mathematical guarantee that anyone, seeing the results O or O’, will essentially learn exactly the same things about John’s private information. Or in other words, we should not be able to infer whether John’s private information has been used in the computation. We should learn exactly the same things from O as we do from O’, namely the general information that is contained in the data set.

However, differential privacy does not guarantee to keep John’s general information private. Information that is generally available is not guaranteed to be kept private in a differentially private algorithm. Therefore, when designing a differentially private algorithm, it is important to clearly understand what is general information (known publicly) and what exactly is private information.

# Why we want Differential Privacy

Differential privacy has roughly four interesting properties that allow us to reason and protect the privacy of personal information in large data sets [1].

1. It allows us to quantify the loss of privacy. It enables us to compare different data processing algorithms. We can control exactly how much privacy we want to lose. We have a way to choose a trade-off between the accuracy of the general information contained in the output and privacy loss.
2. Composition of DP building blocks. We can analyze the cumulative privacy loss over several computations or differential private building blocks. Since we have a mathematical way of understanding exactly how much privacy we lose in a single Differential Private building block.
3. Understanding Group Privacy. Differential privacy gives us a tool to understand and control privacy loss incurred by certain groups (f.e families, a group of friends, employees, etc).
4. Insensitivity to Post-Processing. A malicious actor, that has no information on the private data set that is used, is in no way capable of computing a function of the output to make it less differentially private.

# Conclusion

We went from an intuitive understanding of privacy to a more formal way of defining differential privacy and why this is useful to have. Since this is a LinkedIn article, I won’t bother with mentioning the actual mathematical notation of differential privacy. It is left as an exercise to the reader in the references. Whenever we design algorithms that compute results on large data sets, composed of personal information, we need to ensure no personal information is leaked. Differential privacy is a new and upcoming field, it will be interesting to see how it will fit and develop over time.

# References

[1] The algorithmic foundations of differential privacy. Cynthia Dwork and Aaron Roth. Foundations and Trends in Theoretical Computer Science, 9(3 4):211–407, 2014.

[2] Differential Privacy: A Primer for Non-technical Audience. Kobbi Nissim et al. February 14, 2018.

## The Startup

Get smarter at building your thing. Join The Startup’s +786K followers.

### By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

Medium sent you an email at to complete your subscription.

Written by

## Dario Incalza

Dario is a computer scientist, interested in anything related to cybersecurity, cryptography, privacy and artificial intelligence.

## The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +786K followers.

Written by

## Dario Incalza

Dario is a computer scientist, interested in anything related to cybersecurity, cryptography, privacy and artificial intelligence.

## The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +786K followers.

## Still using ECDSA? Think again, and consider EdDSA

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app