My dad studied labor relations, getting a degree in the late 70's just in time to watch the colossal collapse of organized labor that occurred in the 1980's. Recently, while talking politics on a six hour drive, he proposed his solution for the privacy problem: Companies should pay users for their data.
I was surprised how opposed I was. This seemed to contradict a fundamental principle of the internet: that it is free and open. My first instinct was to think of how difficult it would be to implement that system: offering micropayments perhaps, based on the market value of data. But how is the market value of data determined?
I fell back to an idea of the “social contract of the internet,” in which users expect the owners of a website to record the usage statistics. It seems right only because it is technically possible, although prohibitively difficult for any project an advanced beginner like me has done. But the “social contract of the internet” is not well defined, and it is dangerous to use it to justify a transgression of privacy.
I pointed out that secure alternatives to gmail existed, but they were paid. However, this still does not compensate users for giving their data to the companies’ control. My dad seemed to view the users as un-unionized employees, without any official legal advocate for their rights. Although at least one advocate would push for distributing payments to users, it doesn’t seem to make sense. I don’t want to recieve intermittent cash from google for using my data- I want more storage space and better design. Google doesn’t want to pay me, and I do not want google’s money.
My dad also proposed restrictions on how companies could use data, or how long they could keep it. This was anathema to me, it seemed like burdensome regulation, which would be difficult to enforce and would be easily circumvented. The conversation ended without any agreement, although I was again surprised that I was taking the side of the corporations in the argument, although that may just be my own contrariness.
Afterwards, I thought about how I came down supporting the large-scale collection of user data. It seemed to make sense given the way the internet is designed, but it leads to gmail ads, as well as the NSA obtaining all of the metadata that the wish from FISA warrants.
But why do the companies want data? The desires seem to fall along two lines.
- Companies want to record how users interact with their sites, in order to improve the user experience (often called A/B testing, after the technique of showing two different designs to two groups of users and tracking their engagement)
- Companies want to sell user engagement to advertisers. If the companies can present the advertisers with additional information about the user (such as demographic data or purchasing history), the advertisers will pay more so that they can better target their ad campaigns to those likely to respond.
Neither of these core desires revolves around the identity of the user. For A/B testing, the company is only interested in the aggregated actions of many users. When selling an account to an advertiser, the company is only interested in the features of the account, not its name.
In contrast, user anxieties are all connected to their identity.
- Users are afraid that their data will be stored forever, without their consent.
- Users are afraid that their data will be exploited if the company is bought.
- Users are afraid that their personal data and usage statistics will be stolen in the event of a data breach.
If there was a way of completely decoupling the identity of the user from the data collected about that user, almost all of these anxieties would be significantly alleviated.
A user model would contain attributes, not history. It could contain a numerical estimate of the probability of clicking on advertiesements. It could contain numerical measures of interests in various topics. What it does not contain is any specific history.
The user models and the user actions must be kept separate from each other. A user should be able to see her user model, but an advertiser who purchases clicks or views from a specific user model should have no way of connecting that user model to a specific person, set of posts, or actions.
Once the website receives any kind of data about a user, it should use it to update the user model and then immediately delete the data. The company has received the information that it wants already: a futher refinement to a user model. This will let them sell targeted ads without preserving unwanted information. Unless there is a reasonable expectation that the user will want to access that data, then there should be no reason to store it. This technique is called ephemeral data.
The data would not be vulnerable to subpoena, and it should be difficult to connect a user model with a specific user, although it would be possible by subpoena, although it would not give away useful statistics to spies.
Implementing an ephemeral data collection system in large companies would be a good first step to address the systemic violations of privacy rampant on technology companies. There could be a standard for ephemeral data certification, which would show that it is impossible to link a user to an account. For example, the id of the user model could be a hash of the id of the user account. This would make it difficult to link them without making too much of an inconvenience for companies.
Ideally certified-ephemeral companies would lose less by data breaches, and users would be less anxious. When one large companies adopt the standard, the rest would follow suit, much like film and game studios have voluntarily adopted content ratings standards.
Data should not be saved, it benefits nobody. Data should be ephemeral.
Email me when Christopher Silvia publishes or recommends stories