À la recherche du privacy perdu

Paula C.
Paula C.
Nov 4 · 4 min read

As we engage in with and move through technical modern societies, we have no choice but to leave data trails behind us. To put into perspective, up to 2003, the amount of data collected over the five millennia since the invention of writing is about 5 exabytes. Since 2013, humans generate and store this same amount of data…every day.

Data shadow vs. Data footprint

The data points relating to an individual define that person’s digital footprint. This data can be gathered in two contexts that could be problematic from a privacy perspective.

The process of collecting data without a person’s awareness and consent is called data shadow, whereas, when an individual chooses to share data about himself, but has little control over how these data are used or how they will be shared with and repurposed by third parties, it is called data footprint.

Today, very personal information that we may not want to make public can still be reliably inferred from seemingly unrelated data we willingly post on social media. By simply using the items that an individual has liked on Facebook, data driven models can predict that person’s sexual orientation, political and religious views, intelligence and personality traits, and use of addictive substances such as alcohol, drugs and cigarettes; they can even determine whether that person’s parents stayed together until he or she was 21 years old.

Still… hope can be found in knowing the recent computational approaches that preserve privacy.

Wandering in NYC. Photo Year: 2019

Approach №1: Differential privacy

It approaches the problem of learning useful information about a population while at the same time learning nothing about the individuals within the population. This approach uses a particular definition of privacy:

The privacy of an individual is not compromised by the inclusion of his or her data if the conclusions reached by the analysis would have been the same independent of whether the individual’s data were included or not.

An intuitive explanation of how this approach works is the randomized-response technique. For example, a survey that includes a sensitive yes/no question using a procedure for injecting noise into the data-collection process or into the responses to database queries.

For Apple advocates, this is the approach the company uses to protect the privacy of individual users while at the same time learning usage patterns to improve predictive text in the messaging application and to improve search functionality.

Approach №2: Federated learning

When the data used in a project comes from multiple disparate sources such as when companies collect data from a large number of users of a cell phone application, rather than centralizing these data into a single repository, the approach is to train different models on the subsets of the data at the different data sources and the to merge the separately trained models.

Google uses this federated learning approach to improve the query suggestions made by the Google keyboard on Android.

Toward an Ethical Data Science…

The most broadly accepted principles relating to personal privacy and data are the Guidelines on the Protection of Privacy and Transborder Flows of Personal Data.

The guidelines define eight principles that are designed to protect a data subject’s privacy:

1- Collection Limitation Principle: Personal data should only be obtained lawfully and with the knowledge and consent of the data subject.

2- Data Quality Principle: The collected that should be relevant to the purpose for which they are used; they should be accurate, complete and up to date.

3- Purpose Specification Principle: The data subject should be informed of the purpose for which their data will be used.

4- Use Limitation Principle: The data collected should not be disclosed to third parties without the data subject’s consent or by authority of law.

5- Safety Safeguards Principle: Personal data should be protected by security safeguards against deletion, theft, disclosure, modification or unauthorized use.

6- Openness Principle: Data subjects should be able to acquire information regarding the collection and use of their data.

7- Individual Participation Principle: Data subjects have the right to access and challenge personal data.

8- Accountability Principle: A data controller is accountable for complying with the principles.

The GDPR, which can be broadly traced back to this guideline, has implications for the flows of this data outside of the EU. Currently, several countries are developing data protection laws similar to and consistent with the GDPR.

At the moment, public opinion is broadly negative toward both government surveillance and Internet companies. In 2014, a Spanish citizen, Mario Costeja Gonzalez, won a case in the EU Court of Justice against Google, asserting his right to be forgotten. The Court held that an individual could request an Internet search engine to remove links to webpages that resulted from searches on the individual’s name under the grounds of inaccurate or out of date data.

There are good business reasons to act ethically in relation to personal data. First, it ensures that a business will have good relationship with its customers. Inappropriate practices can cause severe reputational damage and customers to move to competitors. Second, it is the best way to ensure that the data science solutions developed do not ignore current or future regulations.

Photo by ev on Unsplash

Source: MIT Press Essential Knowledge Series.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade