In another post I gave a short overview of the philosophical problem of induction and pointed out why it’s a problem for capturing moral growth in persons using machine learning. As machine learning increasingly fuels its predictions by relying on Behavioral Big Data (BBD), the line between data science and social science becomes fuzzier and fuzzier. Consequently, as BBD-powered data science gradually inches towards becoming a quantitatively-oriented, technology-centric social science (see Alex Pentland’s “social physics”), it will eventually need to confront the same philosophical questions faced by social scientists.
The Role of Social Science: Description or Prescription?
These major philosophical questions concern the role of social science as either a descriptive or prescriptive enterprise. In other words, is the purpose of social science to merely “report the facts” or to give us clues for how to better structure our social worlds?
Those who favor a descriptive role tend to take their inspiration from the research models of the natural sciences, such as physics and biology, while those who favor a prescriptive role follow in the footsteps of Marx and see a major task of social science as furthering the moralistic goal of human emancipation. For Marx and later Critical Theorists, human emancipation meant “greater freedom, greater equality, greater material and spiritual well-being not for the few but for the many.”
Competing Methods: Positivism & Interpretivism
These ideological differences about the proper role of social science are roughly reflected in two major methodological camps. Positivist social scientists tend to espouse a strict division between facts and values, while interpretivist social scientists tend to claim that facts and values are inextricably mixed.
You can identify positivist research by its use of quantitative modeling and statistical hypothesis testing. In positivist research, we are interested in intangible “constructs” “operationalized” as tangible but error-prone measurements. Valid, reliable measurements of these constructs are crucial to good research. Interpretivist social science, on the other hand, relies on ethnography, narrative analysis, and iterative (not deductive) theory building. A main goal of interpretivist research is to understand the process of meaning-making in individuals and groups. Understanding the meaning behind our social activities and behaviors, however, can get messy, as I’m sure we’re all aware.
Positivism takes its philosophical underpinnings in part from the logical empiricism of the Vienna Circle (which included such luminary figures as Carl Hempel, Rudolf Carnap, Kurt Gödel, and the early Karl Popper), while interpretivism draws from the hermeneutic and phenomenological theories of Wilhelm Dilthey, Martin Heidegger, Hans Georg Gadamer, Edmund Husserl, and others. To this day, the structure of German universities reflects this ideological and methodological divide between the Naturwissenchaften (roughly “natural sciences”) and the Geisteswissenschaften (literally, “spirit sciences,” but more commonly glossed in the USA as the “humanities”).
With this bit of necessary background now behind us, we can turn to machine learning and explore how aspects of the fact-value debate apply to modern, BBD-based data science.
This post will focus on the so-called fact-value distinction as applied to machine learning and relate it to cases of algorithmic bias and injustice.
What is Algorithmic Bias?
Algorithmic bias purportedly surfaces everywhere, from criminal sentencing, recividism prediction, policing, and hiring, to Google search results. Implicit in referring to these cases as instances of bias is the belief that the world should be otherwise. In other words, the predictions of these algorithms reflect a mistaken conception of the world. These predictions assume a world where, for example, women are not engineers, or black suspects need more policing to prevent future crimes.
When we say these predictions are unjust, we express a moral claim that individuals or groups have not received what is properly due to them. These claims have moral content because they ascribe properties of goodness or badness to the state of affairs they describe. A concept like justice is what the Cambridge philosopher Bernard Williams would call a “thick concept.” A thick concept not only describes a property of a possible state of the world, but it also includes an evaluative attitude towards that state.
In our case, when we say an algorithm is biased, we are essentially saying it distributes benefits or burdens in society in some way and this way is wrong. Benefits may be things like high-paying engineering jobs or low prices; burdens might be things like prison sentences or police frisking. These distributions are typically operationalized in ML during test set evaluation when we compute various performance metrics.
Summaries Hide Important Moral Complexity
For now, we should be cognizant that for issues of fairness or justice, we must look at distributions of benefits and burdens and not simply summary statistics that can hide distributional complexities. For example, when doing a counterfactual analysis of the efficacy of a treatment or intervention, which particular individuals might have relatively benefited the most? Perhaps those individuals should be the focus of our optimization procedure.
Further, we might draw on various philosophical principles of justice, such as the maximin principle of John Rawls, which aims to rank distributional outcomes through their effects on the worst-off in society. Thorough data visualization, not just summarization, is crucial for such approaches to work. This is a major argument against the use of “automated ML” in cases involving Behavioral Big Data that can be tied back to living individuals.
What’s important is to notice that any claim of bias or injustice is an inherently moral claim about a state of affairs being relatively better or worse than another. And we can better evaluate claims of bias or injustice by looking at distributional outcomes, not simply summary statistics. Yet, often times in the media reporting of algorithmic bias, these “other” states of affairs are not explicitly stated but are assumed to be clear from context. It’s as if we all shared the same moral vision for society. But do we?
The Fact-Value Distinction
Let’s take a break from the issue of algorithmic justice and return to the fact-value debate I mentioned earlier. The philosopher David Hume was arguably the first to point out this tension between statements of fact — what is the case — and statements of value — what ought to be the case. Hume argued that if we attempted to bridge this gap, we would commit what was later deemed the “naturalistic fallacy” of trying to jump from description to prescription. According to Hume, there’s no way to derive “ought” from “is.” If someone claims to have done so, she is mistaken.
The fact-value distinction is a very useful philosophical tool to have in your conceptual toolbox. By making the distinction, you can effectively accept the conclusion of an interlocutor within the realm of “facts,” but then argue that in the realm of “values” such a conclusion is inadequate.
For example, when I was growing up, I would often complain to my mom that something was “unfair.” She would respond, “Well, Travis, life is unfair. Get used to it.” Using the fact-value distinction, I could say, “OK, sure life is unfair. That accurately describes our reality. But the real question is whether life ought to be unfair. Do we wish to live in such a world?” What is stopping us from imagining such a place where injustice was not the norm? Invoking the fact-value distinction allows you to turn the tables on your debate partner, shifting the discussion from one of description to prescription.
Nevertheless, Hume’s characterization of the fact-value distinction as a “problem” subsequently influenced generations of philosophers and scientists. Swayed by Hume’s arguments, they sought to clearly distinguish between science, whose sole purpose was to describe the world in terms as objective and absolute as possible (i.e., Thomas Nagel’s View from Nowhere, or Bernard Williams’ Absolute Conception of the World), and the arts and humanities, which aimed to give voice to various ways in which the world should be. Two distinct and largely independent academic cultures were created and sustained to this day (the sciences vs. the arts), reflecting this logically-unbridgeable gap between what is and what ought to be.
Feminist Responses to the Fact-Value Distinction
Of course not everyone agrees that the role of science is or should be to simply report the facts. Indeed, what we believe to be “the facts” may simply be a reflection of the ruling class’ preferred epistemology, put in place to justify their bourgeois status. Marx’s notion of an ideological superstructure, reflecting the values and material interests of the ruling class, has steered much later thinking about the fact-value distinction, especially in feminist philosophy of science.
Feminist philosophers such as Elizabeth Anderson have argued that facts must always be interpreted under various background assumptions, regardless of whether they are explicitly voiced. Likewise “Raw data is an oxymoron,” according to Lisa Gitelman. Even the venerable philosopher-logician W.V.O. Quine claimed predictions of scientific theories were always underdetermined by their empirical evidence (the Quine-Duhem thesis). Given certain experimental results, it’s always possible to give an alternative account that would also produce the same results. For example, we might try to save apparently non-confirmatory results of an experiment by saying our measurement device was simply not powerful enough to detect a difference.
Similarly, two correlated observations might be explained in a near-infinity of ways. While we can rule out some explanations for these correlations, we cannot categorically state what the actual cause is with complete certainty, for there may be other alternative theoretical explanations for the same empirical evidence. Maximum likelihood estimation can be viewed as a mathematical formalization of such an approach to scientific reasoning, where we assume the observed data were generated by some probabilistic model and attempt to find the model’s parameter values which would maximize the likelihood of such an observation.
In any case, the gist of the arguments given by some philosophers of science is that there is no clear distinction between facts and values. Our values, whether scientific — e.g., concerning various standards of evidence — or moral — do in fact play a role in scientific reasoning from observations. In short, there are no such things as “value-free” facts. At some level, we must rely on our values (which don’t seem to have clear or widely-agreed upon logical or rational grounding) to interpret empirical observations and relate these observations back to our scientific theories.
The Limits of Mathematical and Scientific Representation
I hope the following is not a controversial claim: Mathematical representations of a state of affairs can only describe what is or is likely to be, not what should be. Probability is used to represent states of affairs in which outcomes cannot be determined precisely in advance, for example. In mathematics, 2 + 2 either does or does not equal 4. The law of non-contradiction is part and parcel of our mathematical foundations. It would be a category error to assert something like “2 + 2 should equal 4.” It’s not surprising then, that science uses the tools of mathematics to build descriptive models of the world. We collect facts and on the basis of these facts build theories that describe what we believe to be the best description of reality. That’s one approach to understanding the link between science and math, at least. As we described earlier, feminist philosophers would however dispute this account.
ML is Not M(ora)L
Problems arise when data scientists blindly use machine learning to build mathematical models of reality and forget that ML does not make the moral distinction between what exists (or possibly exists) and what should exist (because it is more valuable either intrinsically or instrumentally).
There is no fact-value distinction for ML. There exist only facts.
For the ML algorithm, what exists is fully co-extensive with what is encountered in the training data. If it doesn’t happen in the world of the training data, then it doesn’t exist. Assumptions like this are why, when we train Naive Bayes models, for instance, we often need to apply Laplace smoothing to correct for categories that appear in the test set but not in the training set. Sometimes the reality captured in the training data isn’t really representative of reality. This is where good training on part of the data scientist hopefully comes in.
To put things slightly differently, and hopefully not belabor the point too much, the inductive learning that takes place during model training can only tell us what is/what is likely to be, not what should be. Induction operates by assuming the future will resemble the past — why? Because in the past such an assumption always seemed to work out for us pretty well. If it’s not clear yet, we are dealing with the fact-value distinction again, but this time in the context of machine learning.
ML uses inductive inference to make its predictions, but these predictions may conflict with our moral values concerning various states of the world.
The simplistic empiricism behind inductive machine learning unfortunately prevents algorithms from doing otherwise. When an algorithm is trained on data where only 5% of engineering roles are filled by women, the algorithm has no way of determining whether this is because 5% of engineers are actually women in the population of interest, OR, because there are systematic social and cultural barriers preventing women from filling these roles. In other words, empiricist ontology equates what is observed with reality and leaves out any possibility of there being some deeper structure or mechanism responsible for what we observe.
It is therefore the responsibility of the data scientist to make this causal determination mentioned above. You can now see why a good (activist) data scientist will possess considerable domain and socio-cultural knowledge. This knowledge is crucial to supply the causal reasoning behind the observed training data. The activist data scientist does what the computer cannot, meaning her work cannot be automated away, despite what “Automated ML” vendors may claim.
Human Imagination and Abductive Inference
Unlike computers, humans possess the capacity for imagination. We have the ability to imagine not only what is the case, but what could be the case. Counterfactual reasoning is arguably a distinctly human capacity. When applied to scientific thinking, abductive inference is an expression of the human capacity for imagination and creativity. Simply put, abductive inference is reasoning that goes backwards from observation to the cause of our observation. It’s about going from effects to causes of observed effects. Abductive reasoning effectively allows us to reason about potential realization independently of actual realization. Judea Pearl’s do-operator comes close capturing this ability in mathematical notation, but as of yet, machine learning algorithms have no way to abduce the causes of the data on which they train.
Abduction in Science
Applied to scientific thinking, abduction takes us from mere observation (science at the level of bare empiricism) to the assertion of the conditions of possibility for such observations (a transcendental science, in the words of Immanuel Kant). As such, our capacity for abductive reasoning allows us to model reality using three levels: the empirical (what we observed), the actual (what could have been observed but wasn’t), and the real (the true causal structures and mechanisms responsible for our empirical experience which are the conditions of possibility for our experience).
Abductive reasoning is one major reason why even reinforcement learning will still not be able to take us from narrow AI to general AI. No matter how much training data we acquire and use to reinforce our learning in some context, the Quine-Duhem underdetermination thesis tells us that there could always be counterfactual explanations for our observations. As Marvin Minsky once put it, pulling on a string a billion times will never help us to learn that pushing on the same string will produce a different effect.
Even more, Nelson Goodman’s example of Grue (explained in another post here) reveals how inductive inference can fail to capture unobserved properties, properties which might be temporally contingent and unobserved until some period of time has passed. Imagine trying to predict the emergence of a butterfly after only having seen various caterpillars.
An Example of ML-supported Injustice
In the diagram below we can see the role of data scientists in this process of perpetuating injustice via machine learning. Because ML relies on inductive inference, its predictions can only assume the future will resemble the past.
A fundamental task for activist data scientists, I argue, is to ask themselves the following questions:
Should the future resemble the past? Which data science choices can I make in order to change the future to create a more just state of affairs?
What Would An Ideal World Look Like?
Before it’s possible to do any kind of data science for social good, we need to be clear about what the good is. The predictive models data scientists work on generate predictions, which in turn are acted on by humans, which then affect future states of affairs, which are measured and turned into future training data. There is a feedback loop at work here. The question is about which direction we wish this loop to go in. Without a clear moral vision in mind, we are apt to fall into nihilism, relativism, or Ayn Rand’s Objectivism.
Without reflection and moral intervention by activist data scientists, inductive learning will continue to assume the future will resemble the past, for better or worse. Consequently, we must be thoughtful about how our models are used and what our “ideal” state of affairs would look like. In short, data scientists have the power to choose whether the future should resemble the past.