A Masked Ball in Bohemia ~1748

Privacy as Technology Business

Quick Data and Analytics Meet Trust and Tradition

Consider this about present-day technology:

  • Quick data sharing. Words and numbers in digital form are weightless, quick to search and if access is granted, fast to read.
  • Cheap data collection. Sensor hardware is becoming smaller and cheaper, and tasks conducted online, through transactions of digital commands, have made tracking them easier as well.
  • Versatile data sense-making. Statistics and algorithms tuned to generalize from big data volumes are improving, so the meaning of a chunk of data can grow after its publication.

There are many benefits to this state of affairs. Risks and conflicts are likely as well. The particular aspect I consider here is privacy.

The definition of privacy in current debate is vague. For the sake of clarity, I start with a discussion of what privacy can be understood to be and its greater value. A philosophical prelude, let’s say, to flesh out what the stakes are.

I will bring this discussion to a number of practical points about the business of privacy technology, and its engineering. I believe principles are necessary to this venture, principles that in this domain include history, tradition, law and human shortcomings. So prepare for musings on history, evolution as well as algorithms. Numerous points are touched upon in the process, and the curious reader can refer to the selected references for detail.

What is Privacy and Why Care?

Definitions of privacy are confused. In legal debate it has even been argued the concept is redundant and better reduced to other rights such as liberty or property rights. However, most scholars consider privacy to be distinct, but there is less agreement on what exactly it entails.

A definition of privacy that has been used relates it to the individual control of how, when and to what extent personal factual information is disclosed. Loss of control therefore implies some loss of privacy. The definition is operational as it defines privacy as a process with respect to information.

This definition does not relate privacy to a value, or in short, what is at stake. Human dignity has been used to fix privacy to a social concept with arguably intrinsic value. Dignity in this context should be understood as control over reputation, name and public image, such that the individual is protected from public indignity. Proponents argue this is needed for the individual to become a whole person with purpose in life. That in turn carries intrinsic value worth protecting against infringement. This view is prominent in Continental Europe and has by some observers been linked to the old traditions of honour and dueling that was practiced by European aristocracy up until the 19th century.

The typical North American perspective on privacy places less emphasis on dignity and control of one’s public image. Instead it is closer to principles of liberty and the sanctity of one’s home and the right to be left alone. To be free from state interference in the home is a recurring theme in North American culture and politics. These differences, regardless of exact origin, can be helpful in appreciating practical or operational differences between how societies approaches privacy.

Another perspective is the value of privacy beyond the individual. The varying degree of information or personal secrets we disclose to other persons, has been argued to impact the ability to be intimate and form close and caring relations with these persons. Bonds between spouses or between members of a family would, according to these arguments, degrade if our secrets and colourful lists of personal embarrassments are knowable by all. Another aspect where individual privacy protects a social value is democracy and the secret ballot. Not all that long ago, the popular vote in many countries were done by viva voce, in other words, the vote for a candidate was spoken in an open assembly. This lack of privacy came to be abused to pressure individuals to vote against their wishes, and once that is true for a sufficient number, the moderating qualities, including legitimacy of power, that democratic rule affords to social practice, are in part lost.

The exact value we attach to privacy as individuals or members of a society, certainly matters to how we approach privacy and events and technology that challenges privacy. As the outline above suggests, these values can be variable. I will consider the operational definition of control of personal information disclosure in what follows, and not attempt to evaluate or critique the larger debate about privacy values. That debate will inform the discussion to follow with respect to context and the trade-offs a business dealing in privacy may encounter.

Information Disclosures and Good Morals

There is value in privacy, as seen above. There is value in information disclosure as well. This section does not yet address the business value of personal data. Instead I consider greater social values.

Economists have for five decades studied the adverse impact on markets by information asymmetries. The sales of used cars is a good way to illustrate the concept, and how information disclosure makes a difference.

Imagine a market where used cars are sold, some really good ones and some really shoddy ones. Let us assume the seller of a used car knows the quality of the car, while the potential buyers can’t tell. The textbook finding is that this information asymmetry leads to that the really good used cars have a harder time to be sold at a fair price, because buyers are aware sellers of shoddy cars are motivated to falsely represent them as good ones, thus making the buyer hesitant to pay top price. As a consequence, sellers of good cars exit the market, and gradually it degrades into a market that only trades in low quality. That is an undesirable outcome.

Online markets for used cars, eBay for example, have been found to exhibit an interesting relation: sellers of quality cars are voluntarily disclosing much more information than minimally required. The number of photos of the car in fact correlates with earning a fairer, and higher, sales price. In addition, eBay punishes sellers that after a purchase are found to have seriously misrepresented the car. These information disclosures are voluntary and arguably involve no loss of privacy. However, in a market where we trade ourselves, say the job market, or where we seek social favours, say online dating, the information disclosures that are good to that market or community may require more personal data to become public. If the organizer of the market or community, in their interest to keep the quality high, requires certain disclosures for participation, the trade-off has every possibility to become contentious.

Another interesting twist to disclosure of private information are findings about pro-social stimuli. One fun finding is that putting up images of eyes in a public area makes people more prone to engage in public good behaviour, such as removing garbage. Other behaviour modulations towards the welfare of the group have been observed in primates, our evolutionary relatives, when actions are taken in the presence of an audience, particularly one of individuals that are socially close. Humans appear as profoundly social creatures. Even the mere appearance of being observed, in a sense having private actions and information seemingly exposed, nudges us by some ingrained process in a more collectively beneficial direction.

China is my next example where privacy is, at least ostensibly, traded against social trust. One observation of contemporary Chinese society is the lack of trust, with odious cases of cons, corruption and fear of getting involved typically used as illustrations. The central government of China recognizes this as a problem, and its solution is a social credit system. A credit score on steroids, perhaps: every citizen graded on how honest, trustworthy and moral they are as inferred from their online purchases, social media engagement, frequency of visits to older relatives and so on. The stated intent is to engineer a public metric of trust to be used by social actors in China, such that overall behaviour becomes more efficient. Some reports indeed find that many citizens are supportive. I will not evaluate the merits or feasibility of this project. I include it as an extreme example of privacy balanced against another social value, and a reminder that privacy is not valued for the same reasons everywhere.

The Three Policy Objectives of Privacy Engineering

In engineering for information security there is a neat triad of concepts to guide policy and implementation: the CIA.

  1. Confidentiality: The information is hidden from actors who should not be able to read it.
  2. Integrity: The information is correct and meaningful to the legitimate user, specifically it is protected against malicious alterations.
  3. Accessibility: The information remains accessible when and where it is needed by the legitimate user.

For privacy engineering there is not yet anything as established. The National Institute of Standards and Technology (NIST) of the United States proposed in 2017 the following triad of privacy engineering objectives:

  1. Predictability: A user can make reliable assumptions about what the information they disclose will be used for.
  2. Manageability: A user can dynamically manage which specific information is disclosed to whom, or what to delete or alter.
  3. Disassociability: A user’s information can be disassociated from the individual, such that it can not be traced to the user as a person.

I think these objectives relate well to the greater discussion of privacy above. They envision a case where individuals can be, and become, a person in a social context, market or community, where data disclosure carries value, but without loss, or at least a predicable loss, of privacy. The vaunted non-zero-sum ethos of engineering on display!

I will dig deeper into these objectives, how they practically can be embodied and why it can be a tough business. There may be better policy objectives yet to be formulated, so in that respect I offer a provisional view of the privacy landscape. Then again, that is the sinuous path to knowledge.

Limited Rationality and Cross-Referencing as Hurdles

The objective of predictability has been around before. In healthcare there is informed consent. Before a treatment is given to a person, the person must in most countries give the doctor consent to go ahead with the treatment, and the consent must be informed of what is at stakes, since treatments often involve trade-offs, risks and rewards. The doctors and hospitals are operating under laws of mandated disclosure.

The intent is good: persons should be able to make reliable assumptions about their health. But there is evidence to doubt that the good intentions of mandated disclosure translate into good practice in healthcare or in other areas where it is required by law, such as money lending, insurance, police arrests and food labeling.

It has been found that few people understand the information they are given, since it can be technical. Well, let us add more steps and time to the process. No improvement observed, since the discloser and the disclosee both are increasingly confused. In addition, the human ability to rationally process evidence exhibits some less than flattering features, especially if done quickly. Tendencies to inaccurately value short-term and long-term rewards and costs (hyperbolic discounting) and attributing greater importance to early observations or facts than more meaningful subsequent input (primacy bias) speak to the limits of what a heap of information can do to inform a typical individual.

Another hurdle to predictability is that what can be done with the data may not be known at the time of its disclosure. First, inferential technology is evolving where Machine Learning methods are improving. Second, the ability to derive understanding from a data set is not limited to what is contained in it, rather what can be derived from that data set and all other available data. Massachusetts published in the 1990s a data set of supposedly anonymized healthcare records for all state employees. A crafty student in computer science cross-referenced, or combined, the data with other public data and was able to identify the health record of the then Governor. Do not write this off as an error of bumbling politicians, because the tech-company Netflix went through a similar hullabaloo some years ago.

Short story: as the complexity of analysis or task increases, there will be limits to what any mandated disclosure can do to inform the users in general, and as additional other data becomes available, new ways to discover meaning from previously “harmless” information disclosures may falsify previous assumptions.

How To Disassociate Each Piece But Preserve Aggregate

Imagine you wish to determine the occurrence of a certain illegal drug use. Individuals who use have very good reasons to protect their privacy by giving untruthful answers. Without complete trust between participants and you in this imaginary scenario, can we proceed?

Yes. This is how the conflict actually has been addressed in the past. Each participant is given the following instructions:

  1. Flip a coin. If it is heads, answer the question truthfully.
  2. If it is tails, flip the coin again. If the second time it is heads, answer the question ‘Yes’, and if it is tails, answer the question ‘No’, regardless of the truth.

Let us break down this process. First, any single data point tells us nothing about the corresponding person. A ‘Yes’ answer may still mean the person is not using. A ‘No’ answer may still mean the person is using. Therefore, participants are granted plausible deniability, their privacy on this matter is preserved. Second, with a large enough number of people participating in the study, we know the aggregate outcome of the coin flips, and can infer how many answered ‘Yes’ because they indeed use the drug. For example, if 75 out of 200 total answers are ‘Yes’, the frequency of the use is estimated as 25%.

The key is that a random component is added to each individual entry, which sufficiently obscures what the truthful value is, while the random component is constructed such that its aggregation is predicable within a known precision limit. This idea has been generalized and mathematically formulated with a great deal of sophistication and is called differential privacy.

I will not describe the method in detail here. Suffice to say it is a clever way to ensure disassociability at the root of the data. Even if security measures fail and the individual data ends up in unscrupulous hands, the truth of any given individual is obscured. In cases where the value of the data comes from aggregate properties like averages, trends, statistical baselines or models, rather than the individual pieces, differential privacy is an elegant way to get the best of two worlds. A caveat: the fact that a person at all is part of a database may itself say something, which require separate masking.

Break Centralization with Privacy Mediators

With the Internet of Things (IoT) sensors are becoming more prevalent. Much of the value from the sensors come from their time-series, such as variations and correlations over time for temperature, motion, heart-rate etc. Aggregate trends can be analyzed with the intent to inform action, or a set of encoded rules can initiate an action to automate or simplify some task. Again, inference from this data, in good or bad faith, can lead to a loss of control of information disclosure. The belief of such risk is found in surveys to lead to reluctance to adopt for the home, or to support for the public sphere, what otherwise can be beneficial technology.

The typical architecture of IoT software is that most data are sent to and stored on a remote server usually, but not always, controlled by the company that sold the IoT product. It is a centralized structure, opaque to the typical user with respect to what actually ends up on that remote server. A proposal to improve manageability is to add a privacy mediator as an intermediate software layer between the gathering, and the storage and analysis of the data. The mediator is running as a separate service, supplied by a third-party, under the control by the person who’s data is sensed, it implements a set of policies, like how frequent temperature is sensed, when in the day motion is tracked or if video images should include an automatic redaction of faces. The user can “see” how their data is transformed before it is sent to the central server.

The key to the proposal is to bring transparency to certain aspects, and to empower the individual, in the larger data effort. This is not to say that IoT companies will exploit the data for questionable purposes unless directly prevent to. Even good, honest companies, especially small unknown ones, may not have the public image that make it possible for them to convince a potential customer of their good faith and care for the customer’s privacy. Apple has had plenty of time and dollars to build and brandish their credentials with respect to user privacy. If an emerging inventive startup could integrate their products with a trusted privacy mediator, earn the right to use the trademark, then at least one hurdle to market adoption has been removed.

To my knowledge this is still only a proposal and it is too early to evaluate its feasibility. The problems of predictability, and conflicting business interests might limit how well it can solve the privacy problem of centralized IoT. Still, I think it is worthy consideration.

Think Output not Input, and the Fiduciary Duty

Finally, time to take a step back again and consider the bigger perspective on getting privacy right in a technology business.

Privacy protection is not the first time in history trust is the subject of law, edict or mission statement. The management of other people’s money and property is another domain. The fiduciary duty is a traditional and legal formalization of trust. As one commentator puts it, the fiduciary obligation of loyalty is a “kind of prophylactic prohibition on self-dealing.” For example, if you are trusted with another person’s money to invest and care for, the fiduciary duty requires you to act in the best interest of that person, not your own direct interest.

It has been proposed that transactions of personal data are given a similar formalization. As far as I know this is still only a proposal. I do not want to venture further into this legal and political debate, the reader can find more in the references. The concept I find noteworthy is the notion of placing the requirements, restrictions and aspirations with respect to the output of actions (the goals) relative a more or less enduring social standard, rather than with respect to the specific input to actions (the means).

Especially in the domain of technology, where the time from groundbreaking to passé can be painfully short, standards with respect to privacy may be more beneficial to everyone if embodied as a fiduciary-like perspective along with critical awareness of the general values at stake. Add the three objectives above, and in my mind at least emerges a promising structured approach to synthesize present technology with tradition and social context of privacy.

Claps are currency. If you found this text worth reading, press clap and I will smile a little.

Selected References