What Do We Mean When We Talk About Online Privacy?

And why you should care.

Henri Stern
The Startup


This is part 1 of 4 of a series of musings on the topic of online privacy. I don’t pretend to resolve the problem, simply exploring facets of the space and pulling at strings that may make the web a more wholesome place to explore and help builders think about the moral valence of their technical decisions. View part 2.

TL;DR — Privacy is about the consumer’s intent. Does a product do what you think it will with your data? Consumption choices are deeply personal, but informed consent requires knowing what you’re signing up for. Forthright stewards of data (i.e. responsible companies) must therefore be both transparent and accountable with regards to the use they make of users’ data in order to build good privacy experiences.

As the amount of essential information we put about ourselves on the Web continues to grow, we talk about online privacy more and more. But what does it mean and why should you care?

You can define privacy as “the state of being free from unwanted public attention or observation.” This is different from security whereby a product or service works as its creator intended, and does not unwittingly expose or leak information.

The relationship is one-way: bad security means bad privacy, but good security does not necessarily mean good privacy. Secure products can degrade their users’ privacy. In that sense, security is about creator-intent while privacy is about user-intent. When you gave your phone number to this service, what did you expect they would do with it? Are they doing anything else with it? Did they even need it in the first place? Security is about what data can be exploited by an outside party through unexpected or nefarious means, privacy is about what data is shared or leaked as a regular part of doing business.

Now, the potential for being watched is a part of living in society. We expect to live with a neighbor’s ability to peer through the hedge, or your government’s ability to ask questions. What’s new about online privacy?
The change is not just porting over the potential for eavesdropping from the physical realm to the digital realm, but rather the creation and online publication of a much more complete accounting of everything you do. Think about this: any interaction you have with an online service leaves a crumb of your physical life in the digital world.

The risk here goes beyond any single large company (think Google) being able to know where you’ve been. The structure of the Internet itself, as built through digital advertising, allows many third parties to cheaply weave together your data crumbs to understand your actions and many of your thoughts on a daily basis. Have you gotten some amazingly well-targeted ad recently you couldn’t completely explain? It’s no coincidence.

A modern panopticon. At least we get wifi…

The sheer mass of data you leave on the web entirely changes the privacy calculus for citizens as mass-targeted surveillance becomes more effective and cheaper. The impact is clear not only for those living under repressive regimes but any target of a well-resourced attacker (examples abound alas). For those of us lucky enough not to be specifically targeted or under a wider surveillance net from our government, the danger remains real.

Are you guilty of some offense you have not reported, like speeding or parking in an illegal spot? Have you been to a protest recently? Have you looked something up online you might be embarrassed to explain? Those things you thought private may not be. The barriers between your professional, familial and personal lives may not really exist. Any encounter you have may be with someone who knows more about you and your family than you could guess. Your credit-worthiness may be assessed using many unknown factors. You might get a call asking you to pay back a loan you never made, by someone who knows a lot about you. With every data upload, you may unwittingly be giving a stranger halfway around the world the means to blackmail you or steal from you. Those things you never gave a second thought to that have always been transient have found permanence as data online. The small, deep secrets you sometimes discover cleaning out a loved one’s home after their death may now be online and open to a well-resourced attacker. Even if you aren’t specifically targeted by a well-resourced attacker, you may wind up caught in a dragnet hoping to use the data you’ve leaked online against you. You’ve certainly gotten spam phone calls at this point… It’s that sold data of yours calling home.

Beyond that, the information you rely on to form opinions and your very beliefs may be getting tuned. Our behavior and opinions are shaped by algorithms feeding us news based on our behavior and other data we have fed them.

Given the above, we define the positive value of privacy (here being free from observation) as follows (thanks to @arcalinea for this):

  • Privacy allows freedom from intervention and judgement
  • Privacy prevents forms of censorship
  • Privacy protects user attention from intrusive messaging and advertising
  • Privacy protects user’s information by restricting access to trusted parties
  • Privacy prevents forms of harassment or abuse

But then, is being online worth the risk? “Being online,” here, is the sum of interactions you have with various service providers/companies, repeated over time. Each involves a data exchange, and with it some risk of data leakage, misuse, etc.
The fact is that going offline entirely has become near impossible in today’s world, and that digital life has improved our lives in deep ways. Accordingly, the answer to “is it worth it?” has to be deeply personal and service-dependent.

Is it worth it?

What do you value?

For instance, I am an avid Google Maps and Netflix user, but I don’t use Facebook nor any “smart”/connected devices if I can help it. This is simply based on my personal assessment of risk-reward: how much do I get out of this/what am I giving up for it. Is not planning a commute or a meal worth telling Google where I am and what I am about to do? Reluctantly, yes. Is being able to tell a speaker or my TV what to play next instead of using a remote worth having Amazon or advertisers listen to snippets of my home conversations or the shows I watch? No. But that’s just me.

For an experiment, think about all of the products and services to whom you are giving data on the web. How data hungry are they? We can try and quantify their data mass as follows:

data mass = data quality x data volume

  • data quality means how valuable is this data on its own? How harmful is it for someone else to obtain it? Your social security number, password to your email or bank account, your home address and other PII (personally identifiable information) or PHI (personal health information) are all high-value; an old movie ticket you bought, website you visited once, or the rating you gave some movie on Netflix is lower-value data (though it may still be used to identify you).
  • data volume means how much data of this type is collected by a given service? For instance, a given service may only ask for your mother’s maiden name once, but may track your location data at all times. The former is low volume, the latter high volume.
An attempt at categorizing a few apps on my phone by data mass.

High-value data may be identifying you personally and is likely to be ontologically unique (i.e. the same conclusion about you can’t be reached through some other data). However, high volume can make up for low data quality: knowing you were once on a given street corner is one thing, knowing every street corner you’ve been on in the past week is another entirely.

Accordingly, we might expect businesses with high data-mass to be held to greater privacy standards, given the relative cost of a leak.

Interesting questions arise: for instance, does a data’s value decrease in proportion to the number of service providers that have access to it? How valuable are your email address/phone number given how many times they have likely been leaked online in the past (many many times)? What’s clear in any case is that once leaked, your data will be sold many times over to unscrupulous businesses and scammers. It is no longer yours to control.

Understanding the precise privacy implications of your every online interaction is impossible. But as users, we ought to know what we’re signing up for. You may rethink using that weather app if you knew it is selling your location data to advertisers.

Ultimately, you are the only one who can decide what convenience is worth what cost. But the choices you make about what products to use in your digital life have to be informed by an understanding of what data that product is taking from you and why. Otherwise, how can you know if the service is worth its cost to your privacy?

Beyond that, how can we gauge service providers’ handling of our data even if they were transparent about their actions? It is about control: is this your data or theirs? Do you control what they do with it or do you trust them not to do anything too objectionable? Here’s an attempt at a broad categorization of questions you might ask yourself with regards to a company’s use of your data:

  • Known Interaction: Have you ever interacted with this service? Do you even know this data is being collected? (looking at you credit bureaus, data-collection SDKs in weather apps, etc.)
  • Transparent Use: Do you know what data is collected? Do you know what data is created about you? Do you know how it is used?
  • Fair Use: Do you understand why this data is needed for the service you’re getting in exchange?
  • Read Access: Can you see what data has actually been collected?
  • Write Access: Can you edit this data?
  • True Ownership: Can you delete this data? Is it legally yours? Can you easily port it over to another service?

From the above, we define a forthright company, a responsible data handler, as one that respects its users and meets them at eye level. This service provider is both

  • Transparent: has clear, digestible communication around what data is collected, why, and what power users have over it;
  • Accountable: is answerable to users with regards to data use and any breaches of the above-stated policies.

Both of the above imply that:

  • Your data is not exposed to a third party without your consent
  • Your data is not used by the service that acquired it in a way you do not consent to

Being both transparent and accountable is no easy feat for a company, least of all one whose business model depends on your data. The many examples of bad actors in online privacy show the deep gap in consumer and creator incentives online. We explore this complex relationship in the next post.



Henri Stern
The Startup

Probably thinking about your data right now. Hopefully it's encrypted...