The cherry picked science in Vox’s Charles Murray article

After widespread complaints about the quality of its unscientific critique of The Bell Curve, Vox has now published a takedown of “race science” with a provocative title: “Charles Murray is once again peddling junk science about race and IQ.” The authors are Eric Turkheimer, Kathryn Paige Harden, and Richard E. Nisbett (hereafter: THN), and the first two are widely respected researchers in behavioral genetics, so I suspect this article will be the go-to reference on this subject for many readers, replacing James Heckman’s widely cited review. [Update: Richard Nisbett responds to this piece here. I have responded to a few of his points in this piece, so keep reading if you’re coming here from Vox.]

As Timothy B. Lee writes, “This debunking of Charles Murray doesn’t appear to include a single quote of Murray’s arguments in his own words.” (Lee is a reporter at Vox, but presumably he was not involved in editing the article.) Even though much of the article is fairly measured, Vox’s editors couldn’t help but take the title more seriously than the substance:

Matthews was part of the effort to destroy Jason Richwine’s career over this very issue, and so his reaction is not too surprising. Others on social media have noted that the authors explicitly replace Murray’s argument with a weaker one:

At this point it is important to emphasize just how mainstream Murray’s views are in the field of intelligence research. The most controversial sentence in The Bell Curve attributed about half of the black-white IQ gap to genetics, the rest to environment. In 2013, a survey of 228 intelligence researchers found that the typical scientist in this field agrees:

Source: Survey of Expert Opinion on Intelligence: Causes of International Differences in Cognitive Ability Tests, summarized on James Thompson’s blog.

[Update: Nisbett makes a basic comprehension error in his response, writing,

Still, in both the Snyderman and Rothman book and in the more recent survey, more than half of respondents selected one of two response categories that included zero (one option was “0 percent of [black-white] differences due to genes” and the other was “0–40 percent of differences due to genes”).

The mistake is assuming the “0–40 percent” category reported above excluded the respondents who marked 0%; clearly, the middle three responses are exclusive, since 42%+18%+39% = 99%, while the first (0%) and last (100%) are part of the 2nd and 4th fractions.]

The facts known to most experts in the field of intelligence

The authors present a series of five facts which are “known to most experts in the field of intelligence” which weaken the case for genetic group differences in IQ. While the studies they cite are indeed known to experts, most would probably disagree with the weight the authors give them. Let’s go through them one by one:

  1. The gap has not narrowed in the last 25 years

The authors write that the gap has substantially narrowed over the last few decades:

The black-white IQ gap is decreasing, and is now closer to 10 points than the widely cited one standard deviation (15 points), which is the erroneous value Murray cites in the interview. Academic achievement of blacks has also improved by about one-third standard deviation in recent decades.

In the first sentence, the authors are presumably citing the Dickens and Flynn (2006) article that discusses a narrowing of the black-white IQ gap between the years of 1972 and 2002. Interestingly, they fail to mention that Murray himself wrote an article in response which demonstrated no narrowing over several decades:

As the article mentions, even when we look at school achievement tests like NAEP (which are highly correlated with IQ) with massive sample sizes, as opposed to one of several hundred subjects in 2002, we see no narrowing since the late 80’s, years before The Bell Curve was published.

Source: Rindermann and Thompson (2013).

While it is true that the gap has narrowed a bit more for younger students, it is important to understand the Wilson effect: test scores are much more influenced by genetics at age 17, when shared environment only explains about 15% of the variance in cognitive ability, than at age 10, where shared environmental influences explain about double that.

The Wilson effect: genetic influences overwhelm environmental influences on IQ.

The stagnation of the ethnic score gap in NAEP scores is not a feature of the NAEP test; it is true in PISA scores and SAT scores as well. The gap between white and black SAT scores remains roughly one standard deviation with little change in the last twenty years.

If one actually believes that the gap has narrowed at all in the last 25–30 years just because of one sample from 2002, one has to square that with the stagnation in every other psychometric test. That is a big challenge indeed, but the authors don’t even bother addressing it.

[Update: Nisbett adds more data to the discussion, which you can read in his response. He again cites Dickens-Flynn to argue the black-white IQ gap is at 9.5 points. However, as can be seen in this graph from the paper:

Source: Dickens and Flynn 2006

As can be seen from the above, looking only at adults (remember the Wilson effect) there was a narrowing of the black-white IQ gap from about 1.3 standard deviations to about 1 standard deviations (15 points) — not a 9.5 point gap as he says. But this narrowing happened in the 70s, not in the last 25 years. This is all in the paper.

Nisbett also argues that because there are more African Americans taking the SAT, the stagnation is expected and could mask a roughly 0.3 standard deviation gain relative to whites. For this story to make sense, Nisbett must assume the new black SAT score roughly 0.6 standard deviations below the already low average; since he has provided no evidence in support of this and there is no data available, I will let the reader decide its plausibility.]

2. The Flynn effect has little relevance to racial IQ differences:

The Flynn effect, named for the political scientist and IQ researcher James Flynn, is the term many scholars use to describe the remarkable rise in IQ found in many countries over time. There has been an 18-point gain in average IQ in the US from 1948 to 2002. One way to put that into perspective is to note that the IQ gap between black and white people today is only about half the gap between America as a whole now and America as a whole in 1948. Murray’s hand-waving about g does not make that extraordinary fact go away.

Since the authors are reacting to Murray’s podcast appearance with Sam Harris, they should have noted for readers who didn’t listen to the two hour interview that Murray does not “hand wave” about the Flynn effect: he specifically cites the work of Dutch researcher Jelte Wicherts, whose use of multigroup confirmatory factor analysis (MGCFA) Murray admits is beyond his level of understanding:

As can be seen in the 2nd highlighted sentence, Wicherts is not a “race scientist” by Turkheimer et al.’s standards.

As Dutch researcher Jan te Nijenhuis writes, it is striking how secular gains in IQ subtests (the Flynn effect) correlate negatively with their “g-loadedness” (roughly how strong a proxy they are for the latent variable that explains most of the variance in IQ), while racial differences in IQ subtest show precisely the opposite trend:

Source: Is the Flynn effect on g?: A meta-analysis

Wicherts is a critic of using this “method of correlated vectors” as a demonstration that the ethnic IQ gap is a difference in the latent g factor, and argued in a 2004 paper that the gap/g-loading correlation masks a bias that significantly underestimates the latent ability of its Dutch ethnically minority testers, many of whom came from non-Dutch speaking families. But a 2015 paper by Craig Frisby and Alexander Beaujean found much less dramatic results on the US black-white IQ gap:

Source: Frisby and Beaujean 2015

So while we may need to be cautious about interpreting the meaning of group differences in cross country or cross population IQ comparisons, there are signs that differences in black and white IQ are in fact differences on the all important g factor. But the science of what we might learn from MGCFA in resolving the nature of group IQ differences is still in dispute.

Perhaps the most authoritative voice on the relevance of the Flynn effect to this debate is James Flynn himself, after whom the effect is named, who says:

“The magnitude of white/ black IQ differences on Wechsler subtests at any given time is correlated with the g loadings of the subtests; the magnitude of IQ gains over time on subtests is not usually so correlated; the causes of the two phenomena are not the same.”

His explanation for race and IQ gaps is this:

Go to the American suburbs one evening and find three professors. The Chinese professor’s kids immediately do their homework. The Jewish professor’s kids have to be yelled at. The black professor says: ‘Why don’t we go out and shoot a few baskets?’ The parenting is worse in black homes, even when you equate them for socio-economic status.

Scandalous. But if Murray can cite the research of those who disagree with him, surely the Vox writers could too.

3. The cited adoption studies overstate the positive influence of an upper middle class environment

Murray’s assertion that it is hard to raise the IQs of disadvantaged children leaves out the most important data point. Adoption from a poor family into a better-off one is associated with IQ gains of 12 to 18 points.

Here the authors seem to be citing a meta-study of adopted and non-adopted siblings: in particular referencing the six studies with a total of 253 subjects where such a difference was analyzed. For example, there is one of French half-siblings, one raised in a working class environment and the other in an upper-middle class environment. They have significant limitations, as discussed in James J. Lee’s review of Vox author Richard E. Nisbett’s book on intelligence:

Source: Lee reviews Nisbett

Even ignoring these confounds, as we have seen above the positive (and negative) effects of environment fade well into young adulthood, so IQ gains at 14 should not be taken for granted. Studies with larger samples and tested at later ages show much smaller effects, such as this one from Turkheimer himself on adopted Swedish children, or this study showing about a 7 IQ point boost going from low SES to a high SES environment.

Interestingly, like the Flynn effect — and unlike racial group differences in IQ — adoptees show gains in IQ on the subtests least associated with the g factor”:

Source: Are adoption gains on the g factor? A meta-analysis

4. Head Start does not raise IQ much, if at all

THN explain the benefits of environmental interventions such as Head Start:

It is true (and unsurprising) that poor children exposed to special educational programs such as Head Start tend to regress once the program ends and environmental disadvantages reassert themselves. But the gain in social and intellectual capital from the best available early childhood education can result in an increase of one-third in the likelihood of graduating from high school, can triple the rate of college attendance, can produce a two-year advantage in reading ability of young adults, and can result in a two-thirds increase in the likelihood that they will be either gainfully employed or enrolled in higher education. The best available K-12 programs also result in substantial gains in intellectual and social capital.

This is a strange straw man. I doubt Murray disagrees that the best K-12 programs could raise “social capital.” He doubts it raises adult IQ, which is quite evident from this meta-analysis:

Source: Duncan 2013

The conditions of African-Americans in the 1960s South were truly appalling, and its possible some of the larger effect sizes were real in those conditions, simply from things like better nutrition during formative years.

The claims that it provides a permanent two year reading advantage should be treated highly skeptically, since they are not even cited, and as we have seen there are almost no IQ gains. And like adoption gains, IQ gains from Head Start are disproportionately on subtests that have lower correlations with the g factor.

Vox’s claim that the fadeout is “unsurprising” is revealing. As blogger Spotted Toad notes, the fadeout effect is common to almost all educational interventions and is obviously consistent with the model that genetic effects overwhelm the positive effects from an educationally enriched environment by adulthood. What model from Turkheimer et al. predicts the fadeout in environmental interventions? Why is there no “dosage effect” of environmental disadvantage?

One last thing to note about this subject is that the highly cited critique of The Bell Curve from James Heckman suggests educational interventions might be a way of closing racial IQ gaps, though he already admits Head Start tends to fail at that.

Today, Heckman invites speakers such as Gregory Cochran and Henry Harpending to speak at the University of Chicago. Does that second name sound familiar? If you read the SPLC’s list of “pseudoscientific racists” you might remember him.

5. Heritability of IQ is more or less the same across social class

The heritability of intelligence, although never zero, is markedly lower among American children raised in poverty. Several interpretations of this fact are possible. The one we find most persuasive is that children raised in those circumstances are unable to take full advantage of their genetic potential because they do not have access to the high-quality environments that could support it.

This is yet another overstated “fact” where Turkheimer’s research is an outlier. It would have been more charitable to cite the research of Elliot Tucker-Drob and Timothy Bates who found much smaller gene-by-socioeconomic-status interactions in the direction Turkheimer et al. found.


Here we see that even in the United States, poor individuals have roughly the same heritability for IQ as those who come from middle class families.

Even if we assume Turkheimer’s study is representative of these small Gene by SES interaction effects, what might that tell us about race? Nothing, at least in his sample:

The data is from Beaver et. al which uses the same sample as Turkheimer 2003, and finds no racial differences in heritability. So to the best of our knowledge, Gene x SES interactions are a dead end for this purpose.

[Update: Nisbett cites the meta-analysis and writes:

So despite the misleading impression given by the critics, the meta-analysis was a confirmation of the reduction in heritability among poor Americans. This is important, because it undermines the hereditarian argument that twin studies show family environment doesn’t matter for IQ: For poor children in the US, in particular, the family environment seems to matter quite a bit.

On twitter, Emil Kirkegaard pointed out a more comprehensive analysis of heritability by race:

My question to Nisbett is therefore this: if poverty substantially lowers the heritability of intelligence and black Americans (who are on average poorer than whites) show no signs of this, what are we to make of this result? And what is its relevance to the debate? Emil also sends further evidence of publication bias driving these gene x SES effects in the United States:

Researchers now understand that gene-by-environment interactions are tough to detect and require large samples to estimate.]


The science on this subject is hardly settled, and I agree with the authors in the Vox article that the kind of demagoguery on it commonly found on the Internet is both toxic and has the potential to harm real people. Unfortunately, Murray is a poor target for their rage: he is a careful, gracious and intellectually honest scholar.

I strongly suggest that anyone interested in this topic read University of Minnesota James J. Lee’s summary of the state of knowledge. He is highly critical of the kinds of arguments leveled at Murray among others.

I leave with a question for the authors of this piece. This July, behavioral geneticists will announce over 600 SNPs statistically associated with educational attainment — and IQ by proxy. Are you ready to come back to this topic with that data in hand?

