The Metaphysics of Personal Data: A Sketch of Competing Views

Published in

The Startup

16 min readMar 29, 2020

From http://www.sloveniatimes.com/govt-adopts-bill-on-personal-data-privacy

The collection and processing of personal data is big business. The Association of National Advertisers (ANA) named “personalization” the word the year in 2019. Just for reference, the 2017 the word of the year was “artificial intelligence.” Marketers and advertisers love the idea of personalization. They can precisely target those who would be most likely to purchase their products or engage with their advertisements. New buzzwords such as “micro-targeting” or “precision-marketing” have taken over.

From the consumer’s and application user’s point of view, personalization means reduced information and choice overload. Think of Netflix’s movie recommendations or Youtube’s recommended videos. Beyond that, economists, such as Google’s Hal Varian and technologists such as Wired’s Chris Anderson (“The Long Tail”), have written about how personalized services and products will revolutionize business models. Products and services no longer need to be created for the mythical “average” person, but can be tailored to the specific needs and interests of unique customers.

Personalization is in part possible thanks the broad category of numerical algorithms and statistical models known as machine learning (ML). The collection of techniques used in personalization applications is often called “recommender systems,” and each type of recommender system takes a different approach to the problem of recommendation: e.g., Do we do item or user-based recommendations? Do we use implicit data or explicit ratings?

Take a basic recommender system based on the idea of collaborative filtering. By looking at what other users who were similar to you bought, we can guess you might also want to buy anything they bought that you haven’t yourself yet bought. Collaborative filtering leverages a basic insight about social contagion in humans and is probably the most widely used and easy to understand approach to recommendation. Sometimes these collaborative filtering techniques are combined with other supervised techniques to create “hybrid” systems. Yet without the collection of vast amounts of behavioral data about browsing history, device information, and so on, many of these techniques wouldn’t work. Galit Shmueli refers to these data as Behavioral Big Data (BBD).

Here are the main questions explored in this article: We all know we’re supposed to care about privacy and the protection of our personal data, but what exactly are personal data and how do they relate to actual persons? Why would people claim the collection of our personal data is eroding our agency and autonomy? What do personal data have to do with free-will? And finally, how exactly are we to understand this idea of personalization, which is increasingly integral to the functioning of apps, websites, and social media platforms?

To at least start to answer this question, I believe we need to be clearer about the relations of persons — those things like you and I that buy things and search for things on the Internet — and their personal data.

As Justin Garson has noted, philosophical thinking can add value to debates in myriad fields through its focus on clarification, connection, and caution. For example, philosophical thinking can clarify the conceptual foundations of personal data, make connections between certain data mining practices and their ethical and social implications, and it can warn us when perhaps certain new techniques and technologies ought to be viewed with caution. Ultimately, a clearer understanding of ML personalization will also contribute to current debates on algorithmic bias, transparency, and fairness in machine learning.

Defining Personal Data

At first blush, defining personal data is straightforward. We only need to look to the relevant definitions in law. The most influential law regarding personal data at the moment is the General Data Protection Regulation (GDPR), which governs the use of personal data in the EU. The GDPR states that personal data are any data that could reasonably be linked to a specific, living individual. So your browser cookies could reasonably be linked back to your name which of course could be used to identify you; or your GPS locations could be reasonably linked back to your device and you as the owner of the device, for example. These data would be considered personal data and would need to be processed and stored according to the procedures laid out in the GDPR.

On top of all this, we also need to distinguish between two different but related aspects of personal data. The first is that protecting the identity of a user is often a separate problem from protecting the personal data of a user. We might use anonymous networks and cryptographic techniques in the first case, while we might use statistical techniques based on information theory in the second. For example, a situation where someone submits a Google search but Google is not able to link this search query to a particular person is different from this person submitting a search query to Google without Google being able to infer this person’s intentions, beliefs, plans, or motives. We will be primarily concerned with situations of this latter type, though the distinction between these situations can become blurry.

Nevertheless, attempting to understand the nature of personal data by pointing to its legal definition doesn’t solve the much deeper philosophical problem we are interested in. That is, what is the nature of the relation between living persons and their personal data and how does this relate to machine learning personalization (MLP)? This is the metaphysical problem of personal data that I argue needs more attention.

Two Competing Views: Realist and Performativist

I can identify at least two coherent views about the relation between persons and their personal information in the literature. On the one hand, we have what I’ll refer to as the “realist” or “constitutionalist” view endorsed by the Oxford philosopher of technology and information, Luciano Floridi. He claims, for example, that “‘You are your information,’ so that anything done to your information is done to you, not to your belongings.” On this view, personal data constitute you as you: your personal identity and your personal data are one in the same thing. Rights to informational privacy are necessary to safeguard the constitution of one’s identity because if a data controller or processor were to alter or delete personal data about you, it would simultaneously alter and delete aspects of your identity.

A competing notion is what I would call the “constructivist” or “performativist” position. You can find academics espousing these views primarily in marketing journals. They write journal articles with titles like “The Dividualized Consumer: Sketching the New Mask of the Consumer.” Often times, these thinkers come from a humanities background, with a strong emphasis on the thought of French sociologists and philosophers, such as Baudrillard, Foucault, Deleuze, and others. They make claims such as: a “person becomes performed as a ‘consumer’ within these relations [of a supermarket]” and “the contemporary consumer is performed by modern marketing devices and theories… they do not wear a mask of any kind — they are constituted by the way they are performed in data traces.”

Actor Network Theory proposes reality as a shifting web of agents connected through semiotic relations. Photo by Alina Grubnyak on Unsplash

Performances and Semiotic-Material Networks

As far as I can tell, they are suggesting that socio-linguistic acts — performances — result in the creation of a new entity derived from a living, breathing person. This new entity we arbitrarily denote as a “consumer,” according to the “shifting semiotic and material relations between a wide variety of actors in networks.” More specifically, in the terminology of Deleuze, a “dividual” is a “cybernetic subject made up of data points, codes, and passwords.” This new subject might be considered to exist in what Baudrillard calls a simulacrum or neoreality. The traditional distinction between the artificial and the real is no longer clear. The “you” of your social media profile is just as real as you yourself. Representation and reality have become blurred as digital technology has changed how we interact with one another.

Here’s a quick summary of the two views at this point. The “realist” views your personal data as literally constituting you. Your personal identity and your personal data refer back to one and the same thing. If I change your personal data, I change you. The “performativist,” at least as I interpret things, believes that a new entity, ontologically distinct from its originating source, results from the social-linguistic act of classifying and categorizing persons according to database schemas with other-chosen and defined notions of possible relations, procedures, transformations, aggregations, and so forth. This new entity exists in a kind of digital reality where simulation and reality have become indistinguishable.

Problems with these views

Before moving on, I will briefly point out some problems with both views. First, Floridi’s realist conception is problematic in that personal data cannot literally have rights. Persons have rights, data do not. Data can be nearly infinitely cloned and copied, persons cannot. David Shoemaker says that if Floridi is right, then the notion of “identity theft” would become absurd, akin to a literal kidnapping. Strangely enough, Floridi himself accepts this consequence.

Moreover, Floridi’s view overstates the importance of the classifications and categorizations of others in forming our personal identity. Our personal identities are a function of how society classifies us, but they are also a function of whether we accept such a classification. There is no single “truth” of the matter; rather we must somehow incorporate both inner and outer aspects of personal identity if we want a complete account of personal identity. Floridi’s view misses the importance of self-determination as a counterbalance to society’s influence on us.

Problems With Performativism

The performativist position has even greater problems, in my view. As to their claim that personal data acts to “divide” up consumers and transform persons from from individuals to “dividuals,” I would say the following. Personalization is not dividing up consumers, it is dividing up representations of consumers. This is a huge difference. When I take a photo of you on my phone, I’m not transporting you and miniaturizing you inside my phone; I’m creating a (digital) representation of you. You are represented as a collection of pixel values whose essential structure can be described using binary digits in coding schemes of varying efficiency. I can then easily duplicate and copy these codes with very little effort and cost. Through a miraculous process, enough of your essential structure, in the form of information, has been communicated such that you can be identified through a digital image.

Similarly, personalization doesn’t create new objects, it creates new representations of objects, objects fundamentally rooted in reality (the thing — the person — clicks the mouse or buys the product. This is where GDPR exists.). The performativist views wrongly focus on the genesis of this new “consumer” entity, but lose sight of the fact that this entity is derived from a real person.

The performativists are correct in pointing out that there are social/performative aspects of social and gender identity, but those are merely parts of what makes up a real, living and breathing person. As George Mead noted long ago, our identities are formed through an interplay of private and public experiences (the “I” and the “me”); they do not exist independently of one another. Similarly, machine learning personalization (MLP) can indeed influence one’s gender/social identity by nudging people towards different behaviors, beliefs, and identities. If our identities as persons are related to our behaviors, beliefs, and social ascriptions, as many people hold, then personalization could be a powerful force in shaping individuals and society. Consequently, we might want to regulate the use of MLP so this process unfolds according to our social and moral values enshrined in law. The GDPR can be viewed as one response to this problem of balancing the progress of technology with social and cultural values.

Notions of Truth: Correspondence vs. Constructivist

Each of these views, I believe, rests on a more fundamental understanding of the nature of truth. The question of how personal data relate to a person will necessarily involve some kind of notion of what makes some statement about some person true.

A realist, such as Floridi, may be interpreted as implicitly holding a correspondence theory of truth in which truth is a function of the degree of accuracy between our beliefs and reality. Reality is objective and public, and this relation between our beliefs and reality accounts for our abilities to cooperate together and discover truths about world through science. Science is the systematic process through which our understanding of the world gradually comes to mirror the underlying nature of the universe.

The performativist, however, would say truth is socio-linguistically constructed and historically contingent. It is impossible to distinguish between our beliefs about reality and reality itself. There exists no a priori method or system that can distinguish between what appears true with what actually is true. As Paul Cilliers, a philosopher of complex systems writes, our understanding of reality is inherently bounded by these social and linguistic constraints, and thus truth becomes more a “strategic” consideration depending on our goals and aims at the time. The performativists draw on the philosophical insights of Wittgenstein, who famously argued that we cannot use language to get outside of language, as all thoughts must be expressed in some language and this language we know is bounded and influenced by our history and culture. We can’t escape the “language game.” Accordingly, philosophers such as Derrida have harped on the self-referential nature of language. The late Richard Rorty managed to put a very convincing spin on these arguments under the guise of pragmatism.

A Third View: Truth as Disclosure

At this point, I’d like to introduce a third view of truth that I think may be helpful in understanding the relations of persons with personal data. In my opinion, it’s the best of the three. A major reason is because it explicitly accounts for a major issue in MLP: uncertainty about the quality and accuracy of various data sources, an issue known in the literature as “data fusion.” In reality, data controllers and processors can get conflicting attribute values regarding some entity in a database. At some point these conflicts must be resolved before any predictive modeling can begin.

Perhaps most importantly, Heidegger’s view of Truth as disclosure highlights how MLP applications like “context-aware” recommender systems may be a pipe-dream. The computer scientist Paul Dourish has explored this problematic in his work on context aware computing. Heidegger’s approach also has implications for the way in which “database discourse” constrains and limits objects, forcing them to fit in a Procrustean bed of logical relations, thereby closing off the potential for them to change, adapt and grow. But these ideas I’ll need to develop in another post.

The truth as disclosure view is perhaps most prominently espoused by Heidegger, his student Gadamer, and others in the hermeneutic and phenomenological traditions, such as Husserl. Heidegger himself traces it back to Ancient Greece. This is the view of truth as aletheia.

We can gloss this as the idea that truth consists in a gradual revelation, disclosure, or presencing of a being’s Being. Applied to MLP, we can understand truth as disclosure roughly as follows. MLP, as a new technology, permits us to reveal that which “lets what presences come forth into appearance.” MLP, and the personal data derived from our behaviors in apps and devices, are thus a new “mode of revealing” the person.

What is the General form of Being in which all beings participate?

Heidegger is attempting to generalize the notion of Being which takes its forms in various particular beings. He draws on the Greek etymology of various words to support his interpretation. He says Being (with a capital B) of a being unfolds in unconcealment. Being is constant presence, but the Being of being can only be apprehended through appearance or seeming, as a kind of revealed presence. Being means, in effect, stepping into the light, appearing, unconcealment. Seeming or appearance is associated with distortion and concealment. Unconcealment occurs through measurement or observation, but we must remember our observations are always conditioned by what he calls Dasein, or our “thrownness” into a particularly configured state of the world. Logos, the basis for the word logic, and translated as “discourse” is a kind of gathering of what emerges or presences to us as a unity.

We can think of this process roughly as an inductive process of conceptual formation, whereby individual observations are linked through a unity of self-sameness of the Being of some being. Yet the very act of speaking and using language reifies or makes seem definite and final what originally was a process or emergence, a “gathering of the happening of unconcealment.” Truth properly belongs to Being, not to language as logos, which merely codifies and abstracts from the particulars of a unity of seeming and appearing.

The Hermeneutic Circle brings us an iteratively clearer understanding. Photo by Alex Knight on Unsplash

Essentially, Heidegger is saying that we cannot clearly separate observer’s act of perception from the object of perception itself. They work in tandem. When we use language to speak of these objects, we artificially separate the two, as if the Being of a being could be separated from the apprehension of its Being. What Heidegger is trying to get across is vaguely reminiscent of Heisenberg’s Uncertainty principle in quantum physics and the Hawthorne effect in social science. In social science experiments, for example, it is often the case that mere observation of the subjects can change and affect their behavior, thus biasing your results.If you’ve never read Heidegger, I’ll give you a short taste from his Introduction to Metaphysics.

Being unfolds as phusis. The emerging sway is an appearing. As such, it makes manifest. This already implies that Being, appearing, is a letting-step-forth from concealment. Insofar as a being as such is, it places itself into and stands in unconcealment, aletheia (pg. 107).

Another major advantage of this view of truth is that it accounts for both the subjectivity and objectivity of human experience. This is because a phenomenon may reveal or disclose itself to an observer in a multitude of ways, each way perhaps revealing a different aspect of the unity of the phenomenon. On the other hand, this view supposes that there is in fact some underlying objective phenomenon causally responsible for the interaction between the subject and object. This interaction, Heidegger and others maintain, is always mediated in some way and never quite allows us to capture the entirety of the phenomenon in question. Through an abductive process laid out in the hermeneutic circle, we can reach progressively better interpretations of not just behavior, but the intentions causally responsible for behavior.

An Example of the Disclosure of Being

Imagine looking at an apple on a table. We recognize the apple as an apple, yet we only see the object from one angle and under one source of lighting. Nevertheless, we can somehow relate this experience to the much broader concept of “apple.” Similarly, our measuring instruments capture just some narrow aspect(s) of the phenomenon, and these aspects are just one of many possible dimensions of the phenomenon. I can then represent the phenomenon by referring to these measured dimensions.

For example, I can record your clickstream data on my website, but these data are just one of many ways of representing and understanding your essence (not a particularly good one either). In machine learning, we commonly use “dimension-reduction” techniques to reduce the computational complexity during optimization. In many cases, these reduced or “latent” representations can be more accurate because they have effectively removed “noise” from our measurements of the phenomenon of interest.

The Hermeneutic Circle

Gadamer’s concept of the hermeneutic circle illustrates this principle of iterative revelation, observation and abduction quite well. We can converge on a fuller understanding of the nature of the phenomenon as we increasingly understand its parts. Yet the meaning of the parts relies on its relation to the phenomenon as a whole. As we accumulate data on some phenomenon, we tend to reduce our uncertainty as to the nature or essence of the phenomenon. The reduction of uncertainty is a very important concept in both science and statistics, which relies on the mathematics of probability to quantify the uncertainty of our inferences. For hermeneutic philosophers such as Paul Ricoeur, this process represents a viable way of understanding the intention behind literary and “behavioral” texts. As I will try to explain below, the hermeneutic circle parallels ideas of information theory which formalize the notion of reduction in uncertainty.

The Future: Humanities and Data Science

As more and more social commentators and authors, such as Shoshana Zuboff — who recently wrote the influential book Surveillance Capitalism — write about the “erosion” of free will, and the precariousness of our “right to a future tense,” we need a deeper and clearer understanding of how our personal data relate to us as persons. Just repeating the same phrases about free-will and autonomy over and over doesn’t help. These are complex issues. Without understanding how MLP works and without understanding the role of personal data within this process, it’s not clear how these new machine learning technologies threaten our freedom or autonomy.

I’ve tried to point out that MLP focuses solely on predicting your behaviors within a very narrow space of possibilities. I’ve also tried to convince you that you’re much more than just your behaviors. As such, commentators such as Zuboff overstate their case about the death of free-will. But we should be worried about advances in MLP that focus on predicting behaviors using your inner life of psychological and emotional states. And if and when MLP progresses to the successful prediction of your thoughts, plans, and beliefs, then we should worry. Neuromarketing is an example of an emerging area that comes uncomfortably close to this grey area.

Philosophers such as Luciano Floridi have begun to work out how informational privacy relates to personal data. Marketing professors have analyzed the use “database discourse” and its one-sided approach to self-identity, representing a form of power that MLP wields over data subjects. These are good first steps, but I believe there’s still much more to say about the metaphysics of personal data. In short, we need to spend more time investigating how our personal data relate to us as persons if we are to carefully regulate a technology such as MLP, which clearly has so much economic potential.

Experts from many disciplines can contribute to this investigation, from sociologists, to psychologists, and machine learning engineers. Likewise, data and computer scientists and engineers have a responsibility to consider the effects of MLP on both the individual and societal-level. They must begin to consider questions about personal identity, values, and the extent to which persons may vary in their choice of convenience over agency. I suggest we begin to look for ways to optimize for agency, narrativity, and identity rather than just accuracy, AUC, or cross-entropy loss.