Rethinking Data Privacy: The Impact of Machine Learning

Published in

Luminovo

11 min readApr 24, 2019

Arguably, 2018 was the most relevant year for data privacy since the Snowden leaks in 2013. GDPR came into effect in May, presenting the first extensive rewrite of privacy law in Europe. It was the consequence of a breathtaking series of scandals: Cambridge Analytica had collected and exploited Facebook user data; vulnerabilities in Google+ had exposed data from half a million users; 25 million unique passport numbers had been stolen in a Marriott International data breach. The value of collecting data has skyrocketed ever since, and debatable partnerships, like GlaxoSmithKline getting access to genetic data from 23andMe, continue to emerge as a consequence thereof. Privacy breaches have of course provoked counterstrategies, both legal, like GDPR, and technological. To prevent personal information from being vulnerable in the first place, identities are often masked in the process of data anonymization. Stripping confidential information of its identifiers seems straightforward in principle — fields containing names or social security numbers can easily be removed from a database. How is it then still possible to reconstruct an individual’s identity from anonymized data, and why is the issue of data privacy particularly relevant in the context of AI?

First, we will briefly look at the dimensions of datasets, which are relevant to understand the…

Rethinking Data Privacy: The Impact of Machine Learning

Written by Arianna Dorschel