Data Privacy: The stakes are high

Persistent Systems
3 min readJan 28, 2019

--

Data, the new oil of the digital era continues to drive economies around the globe. As processing of personal data keeps raising business expectations, consumers are getting increasingly concerned, and policymakers are reacting. On the other hand, new technology, best practices and a market for digital privacy is shaping up.

1. Great expectations

The business model on which many successful online services are based revolves around the collection of consumer personal data to create personalized advertising. Many born-digital businesses are built around the refining of personal data, and trade only in this new digital oil: for instance, modern banking is essentially a data management operation with data representing billions in investments. As for traditional companies, most believe they are missing out on a competitive edge by not making the most of their customer data, e.g., to improve customer services.

2. Consumer concerns

Consumers expect that the data they explicitly entrust companies with is protected and remains private. If this expectation is not met, they resent it very strongly, which explains why recent, very publicized data breaches have significantly eroded consumer’s trust in businesses. This is further compounded by data misuse: businesses collecting people’s data (e.g., from online transactions, from location data) without telling them how, why, and what they are actually doing with it. The view of most consumers is that stronger regulation makes them safer. After EU’s GDPR and California’s CCPA, expect more data privacy regulation this year.

3. Understanding privacy risks

Trust is hard to gain but easy to lose, so erosion of trust is problematic for businesses. After cybersecurity, data governance is becoming a board level issue. It is about what personal data businesses have and how it’s used, besides protecting it against unauthorized access. This includes understanding privacy risks and using best practices and technology to minimize these risks. Businesses are not very good at this today. Two examples: (1) To analyze people datasets without intruding in privacy, the belief is that it’s enough to anonymize individual records and release aggregates from it. (2) Sharing and linking consumer records among organizations and assembling attributes for the same customer is hard to do without releasing PII data such as names or phone numbers. Both of these beliefs are wrong. I’ll talk about the former in more detail below. As for the second one, privacy-preserving linkage techniques have matured over the years, and we have used them recently to build solutions for our customers.

4. Differential privacy

The privacy problem is defined as how to disclose useful information both internally and to 3rd parties about people through algorithms (aggregate queries/releases, machine learning models) without revealing information about individuals. Anonymizing PII data is not always sufficient: the privacy it provides quickly degrades as attackers repeatedly query the dataset or the model trained on it and obtain auxiliary information about individuals represented in the dataset. Differential privacy is a framework that carefully adds random noise to the algorithm’s output, offering mathematical guarantees that only limited amount of information about individuals is leaked. The first real-world implementations of differential privacy have been deployed by companies such as Google, Apple, and Uber.

--

--