Why Search Logs are not Quite Like Dead Trees

Alex O'Connor
5 min readJul 3, 2015

--

This piece was inspired by the keynote given by Prof. Eoin O’Dell of the School of Laws in Trinity College Dublin. Professor O’Dell delivered the keynote at UMAP 2015, entitled ‘Rear-Window Privacy’. His Blog is http://www.cearta.ie/

“Data is the New Oil!”

It’s an appealing image. Oil is the great enabler of our modern civilisation, we live in the oil age. It allows individuals unprecedented freedom of movement, it enables comfortable, stable environments, and it means we can afford to live in what previous generations would have regarded as unimaginable prosperity.

Oil is concentrated potential energy: it’s the residual power from ancient plants, lying concentrated for centuries in the heart of the Earth, just waiting to be extracted and refined and used for the benefit of the consumer. Critically, it’s a commodity, we can trade oil, and add value at each stage of processing. Raw oil can fuel an airplane, or be an ingredient in our medicine. The downsides are carbon emission into the atmosphere, and damage done to extraction sites.

Data is the current focus of both industry and research. Data-driven approaches to machine learning have revolutionised our day-to-day lives, permitting us unprecedented access to knowledge, better awareness of our surroundings and a previously-unimagined transparency of communication with our social networks and the global community. All of this power is built on data and builds data. The companies that provide services also trade a refined, commoditised data product that adds value. The downside of this is aggressive advertising, and the damage done to individuals subject to data breaches, intentional or otherwise.

Another commonality is this: both advertisers and oil companies have (largely) got pretty bad reputations. A lot of people don’t like extraction, but they are perfectly happy to live with the benefits of that extraction. Similarly, the free online services would not exist without clicks and banners, but we don’t like to be reminded of that.

One side of the privacy issue is the oil slick: the data breach. Each company these days can be considered as an oil tanker, which can founder on rocks, or be subject to piracy. A more interesting challenge is the question of how intentional use can have consequences.

To carry the analogy to its absolute extreme: if oil is made from dead trees, what happens when the forest wants to retain control over where the oil is used?

One aphorism that keeps appearing is that “If you’re not paying for the product, you are the product.” It can be read in a few ways, but it should not be an excuse for justifying limitless exploitation in exchange for a service. Users need to be aware that their interactions are the trees that make the oil, and that has both positive and negative uses: it powers advertising and potentially insurance or other actuarial equations; it also is the reason that Google’s search engine works so well, for smarter interfaces in maps, email and better design of many app and web interfaces.

In some senses, the worries about privacy are overblown. I have a personal, anecdotal, feeling that there is a bit of (albeit unintentional) priming implicit in directly asking about privacy rights. People also confuse abuse and use, surveillance with stalking. Add in the fear around shadowy intelligence agencies, and it becomes difficult to separate emotion and reason.

I'm willing to believe that most people don’t suffer any real disadvantage from sharing more than they might intend to, albeit with some spectacular individual counter-examples. As awareness increases, and people become habituated, social norms and behaviour co-evolve.

If privacy is the new green, is Google the new Exxon?

Big Data is oil: it’s potential knowledge in unrefined form; it’s a potentially toxic substance, and needs to be handled with care. It’s critical that we balance the concerns of information ecology with keeping the lights on.

The key difference between oil and data is that personal data remains linked (or linkable) to individual users. It’s virtually impossible to strip the individuality from the data without also stripping its information value. Rights endure the refinement, and there is a possible harm to people (and in the case of genetics, their descendants) from unintended and unimagined use of personal data in the future. When one contemplates abuse, the picture is even murkier.

All sorts of nightmare scenarios have been imagined. What’s ethical in theory, and what’s likely in the hard world of commercial pressure? The latter is especially difficult when users behave differently in practice to how they say they would act in theory.

A CRONUT, which is itself subject to a trademark http://www.newyorktrademarkattorneyblog.com/2014/02/24/controversy-surrounding-cronut-trademark-registration/

If we are information ecologists, we might hope that empowering and educating users will encourage them to make theoretically better (i.e. more prudent and long-term) choices. As a counter-example, privacy cynics will point to the enduring popularity of fried food and doughnuts.

The problem could be one of parameterisation. Prof. O’Dell suggests the example of the Creative Commons License: it has clearly defined a set of licenses across choices. I think this is part of the value: what are the options, beyond binary ‘share or not’ for privacy rights? The second value is that, for those who want to try an take control of how their data is used, they have something interpretable by humans as well as implacable machines & lawyers (but I repeat myself).

Like CC, it’s clearly open to abuse, and the problems of the default. We’re already experiencing both of those challenges in privacy anyway. The Right to be Forgotten has led to confusion about whether freedom of expression has been restricted, and the cookies’ acceptance has led to countless UI anti-patterns with consequent backlash.

The vision of being able to give users awareness of what choices they can make about their privacy rights, and have a reasonable expectation of knowing how a service provider will make use of that data is compelling.

--

--