Is Statistics Racist?
A number of the most important progenitors of modern statistics were also passionate advocates of eugenics. To a large part their interest in statistics was motivated by a desire to promote the eugenics agenda. However, in today’s science, we use the statistical tools developed by figures such as Francis Galton, Karl Pearson and Ronald Fisher without worrying about their ideological roots. We assume that the mathematics itself is unbiased. This may well be true, but in applying mathematics to the real world we need to make philosophical assumptions that can carry bias. In terms of the statistics of Pearson and Fisher there is a strong bias against causal explanation. In this article I argue that such a bias would tend to support a eugenics agenda. I suggest that classical statistical methods have a politically conservative bias, and that modern developments in causal inference represent the other side of the socio-political coin.
Francis Galton was the patriarch of a remarkably interconnected tribe of early 20th century scientists who established the fundamental basis and techniques of modern statistics. In particular, Galton was a key influence for both Karl Pearson and Ronald Fisher. Pearson, the father of mathematical statistics, founded the world’s first statistics department at University College London (UCL), whereas Fisher has been called the most important figure in 20th century statistics. Galton, Pearson and Fisher were united by more than simply a love of numbers, however. They were also all committed eugenicists. When Galton endowed the Galton Chair in National Eugenics at UCL, it was first held by Pearson alongside his Professorship in statistics. When Pearson retired, his two roles were split. His son, Egon, became Chair of the Department of Statistics, and Fisher was appointed the second Galton Professor of National Eugenics.
What is important to realise is that eugenics was far more than just a side interest for Galton, Pearson and Fisher. To a large part their desire to promote the eugenics agenda provided the motivation for their interest in statistics in the first place.
But what is eugenics and why does it have such a strong link with statistics? In order to understand this we need to look no further than another of Galton’s relatives, his cousin, Charles Darwin. In particular, Darwin’s theory of natural selection ignited the nature-nurture debate. The eugenicists were firmly in the nature camp. This meant that they believed that a person’s behaviour and abilities are largely innate, inherited from their parents. For instance, in “On The Laws of Inheritance in Man”, Pearson once wrote that, “intelligence can be aided and be trained, but no training or education can create it. You must breed it”. The motivation behind eugenics is clearly encapsulated within this statement — the eugenicists advocated a kind of “social Darwinism”, in which selective breeding practices would allow a race of super humans to be bred. In fact, it was Francis Galton himself who invented the term “eugenics”, which means literally “well-born”.
If all of this sounds reminiscent of the programmes of racial hygiene conducted in Nazi Germany this should be no surprise. Such ideas were prevalent throughout the world in the early 20th century. Pearson himself spoke approvingly of Nazi plans during his retirement speech in 1934. In fact, in the same speech he also made the link between eugenics and statistics unambiguously explicit, when he suggested that if Hitler’s programme of racial hygiene were to fail that, “it will not be for want of enthusiasm, but rather because the Germans are only just starting the study of mathematical statistics in the modern sense!”.
The basis of eugenic studies was to show the primacy of inheritance in the development of human potential. To this end, they sought to use statistical techniques to establish that the lion’s share of the variance in characteristics like intelligence could be attributed to one’s parentage. However, they also frequently compared across arbitrary racial groups in search of racial differences. Because they worked under the assumption that any differences between racial groups are a result of innate factors, this often led them to conclusions that, in 21st century terms, are shockingly racist. For example, in “Hereditary Genius”, Galton wrote that, “the average intellectual standard of the negro race is some two grades below our own”.
It is no secret that the founders of modern statistics were deeply racist. Despite this, the methods that they developed are ubiquitous in modern scientific research. It is surprising that this does not provoke more concern. The tools that we use on a day to day basis to interrogate data and understand the world, were developed by white supremacists for the express purpose of demonstrating that white men are better than other people.
Why don’t scientists worry about statistics’ sordid roots? The main reason is that scientists have fully bought into the myth of objectivity that was promoted by Pearson and his ilk. In “The Problem of Alien Immigration into Great Britain”, in which Pearson concluded that, “the standard of the Jewish aliens in the matter of personal cleanliness is substantially below that of even the poor Gentile children”, he also wrote about, “the cold light of statistical inquiry”, and had the audacity to claim that, “we firmly believe that we have no political, no religious and no social prejudices … we rejoice in numbers and figures for their own sake … to find out the truth that is in them”. As an aside, Pearson’s claim to a lack of prejudice is laughable when evaluated against his body of work. For instance, 25 years earlier in an address to the Literary and Philosophical Society in Newcastle, he had said, “if you bring the white man into contact with the black … you get superior and inferior races living on the same soil … they naturally sink into the position of master and servant”.
However, despite the fact that Pearson was clearly prejudiced, isn’t his claim as to the cold light of statistical inquiry at least true? Can the mathematics itself be biased?
The key issue here is that tools of statistics are not employed in an abstract mathematical space, but, rather, their purpose is to tell us something about the real world. In order to do this we need to make assumptions to connect the mathematics to reality. These assumptions range from the way in which we describe the world in mathematical terms, the types of questions that we ask and the methods we use to answer them, and the way in which we then interpret the real world meaning of our statistical results. There is plenty of opportunity to embed bias in these assumptions. For instance, in “Superior”, Angela Saini, exposes the ridiculousness of classifying highly diverse and continuously varying populations into distinct categories based upon the colour of their skin.
The nature of the link between the mathematics and the real world is a profoundly philosophical question. In fact, it was precisely this question which created the bitter feud between Fisher and Egon Pearson’s collaborator, Jerzy Neyman. Whereas Neyman was critical of Fisher’s sloppy mathematics, Fisher attacked Neyman’s ability to abstract from the real world: “it would be a mistake to think that mathematicians as such are particularly good at the inductive logical processes which are needed in improving our knowledge of the natural world, in reasoning from observational facts to the inferences which those facts warrant”.
It is in these philosophical questions where the eugenicists really betray their biases. One particular question is immensely important in understanding the bias in modern statistics — the well known maxim that correlation does not imply causation. Of course, there is a lot of truth in this statement. The fact that two variables are correlated is not sufficient to conclude that there is causality — there may be confounding variables that explain the relationship or it could simply be the result of chance. However, as Judea Pearl argues in his “Book of Why”, if there is an appropriate amount of evidence of an association, and all reasonable confounders have been controlled for, why shouldn’t we infer causality from correlation?
The above suggestion would be anathema to Fisher or Pearson who were fanatical believers in the “correlation does not imply causation” maxim. Fisher was a well known opponent of the theory that smoking causes lung cancer. Although he accepted that there was a correlation, he argued that this could be due to innate factors. As the weight of evidence for a causal relationship mounted, he resorted to nit picking with increasingly tenuous statistical arguments. Pearson engaged in a similar battle. He refused to accept that tuberculosis was simply an infectious disease. Rather, he argued that the resistance to tuberculosis was inherited, and that those people who succumbed to the disease did so because of their genetic heritage.
The extreme nature of Fisher’s and Pearson’s hereditarian stance is in evidence in the above anecdotes. They refused to accept that environment could be a cause of lung cancer (in the form of smoking) or tuberculosis (in the form of bacteria), and exhibited an almost pathological antipathy towards causal explanations.
But doesn’t the rejection of causality also weaken the arguments that eugenicists made in support of white superiority? If correlation doesn’t imply causation, is it even meaningful if Europeans can be shown to be more intelligent than Africans? Again, this question is philosophical, but for a believer in nature over nurture the answer is no. Being relatively more stupid is simply a part of an African’s blackness, part of the essence of being black. For an essentialist, no causal relationship is needed.
For the eugenicists, what was the benefit in denying that correlation can imply causation? Well it prevents the use of the tool by their opponents. The other side of the nature-nurture debate is the environmentalists’ camp. Environmentalists believe that differences in human abilities are created by upbringing and environment. They are caused. The last thing the eugenicists wanted was environmentalists using correlational analyses to look for causal explanations.
In “The Book of Why”, Pearl describes his battles against the influence of the creators of modern statistics. He frames this as a “Causal Revolution” and suggests that the statistical dogma of Pearson and Fisher has retarded scientific progress. In particular, modern science is based on looking for patterns in data, without reference to the underlying causal models that create the data. The legacy of Fisher and Pearson is a bias against causal explanations.
In the previous sentence, I use the term bias mindfully, exactly because the belief in the objectivity of statistics is so prevalent. However, does a bias against causality reflect a lack of objectivity? I would argue that it does. As we have just seen, a denial of the possibility of demonstrating causality will tend to benefit the nature camp.
In his 1949 book, “The Nature-Nurture Controversy”, Nicholas Pastore argues that a bias towards nature or nurture is reflective of a person’s sociopolitical outlook. In particular, a hereditarian stance is a conservative perspective, whereas environmentalists tend to have liberal or radical beliefs. This should make sense. Right-wing politics are based around the preservation of the status quo, and the belief that existing hierarchies are fair in as much as they reflect the differing innate abilities of the population. Conversely, the left-wing believes in equality of opportunity. This is clearly consistent with the assumption that a person’s abilities are the product of their environment, and that anyone can thrive if given the opportunity.
So where does all of this leave us? Is statistics racist or not? I think it is probably too much to claim that statistics have a racist bias. However, I think it is credible to argue that existing statistical tools and approaches do have a right-wing bias. This is of particular concern given the increasing prevalence of Big Data approaches to science. As has been argued in depth by other commentators (see, for instance, Cathy O’Neil’s excellent “Weapons of Math Destruction”), models based upon Big Data tend to reinforce the status quo, exacerbating social injustices. This is at least partly due to the fact that the tools employed are biased towards essentialist (nature-based) explanations. Similarly, this tendency could be largely mitigated by causal modeling. In this respect, the Causal Revolution is well named — like other revolutions it seeks to shift the basis of sociopolitical authority.