Cornell’s Alternative Statistics
The current flavor of the week for bad science is the work of the Cornell Food and Brand Lab.
Their research head freely admitted to questionable research practices in a blog post, my colleagues and I found an unbelievable number of errors in 4 of their publications, I found errors in 6 more of their papers, Andrew Gelman detailed problems with their methodology, we learned the lab has been using incorrect statistics, making errors, and sweeping them under the rug for years, and the story has been covered by Retraction Watch, Slate, and New York Magazine, with numerous other media organizations chomping at the bit to get a story out.
I’m tired of talking about this story, as is Gelman. I assure you I don’t enjoy going through their research or reading their papers. But my colleagues and I were asked if we found any problems with work by the lab that impacted public policy. Because you see, the lab’s Center for Behavioral Economics in Child Nutrition Programs division has had an impact on school lunchrooms.
I really didn’t want to find any more errors, but as a public service my colleagues and I took a look at some of their work funded by the USDA. Many studies either didn’t contain any means or SDs for us to check, or contained sample sizes too large for us to apply our methods. However, my colleagues did flag this paper:
“Attractive names sustain increased vegetable intake in schools”
Google scholar citations: 97
Below I reproduced some of the key information in the first table.
The table seems innocent enough. There aren’t enough decimal places to apply granularity testing. But my colleagues noticed that the text claims this study involved 113 students.
32+38+45 = 115
Here we go again…
Take a good look at that table. Take it in like a tall glass of water.
Number eaten + number uneaten != number taken
11.3+ 6.7 = 18.0, they reported 17.1
4.7 + 10.3 = 15.0, they reported 14.6
6.8 + 13.2 = 20.0, they reported 19.4
I’m afraid to do it, but let’s move on to the second table.
This table is a complete disaster.
Just a quick glance reveals that 7/8 of the % changes don’t make any sense.
(.054-.018)/.018*100 = 200.0%, they report 99.0%
(.073-.021)/.021*100 = 247.6%, they report 109.4%
(.033-.002)/.002*100 = 1550.0%, they report 176.9%
(.062-.086)/.086*100 = -27.9%, they report -16.2%
(.018-.120)/.120*100 = -85.0%, they report -73.3%
(.099-.047)/.047*100 = 110.6%, they report 35.7%
(.046-.030)/.030*100 = 53.3%, they report 41.5%
Addendum to the Addendum of the Addendum 20170216
Through email discussions and a lively discussion on Facebook, I am now able to determine how the percent changes in this table were calculated.
It is a clusterfuck, here we go.
The authors are not using the percentage change formula, they are using the percentage difference formula.
The percentage difference formula will reproduce the first column of values, but not the second column of values. It is impossible to get negative results with the percentage difference formula if both of your percents are positive.
The first two rows of the second column of percent changes are calculated with this formula:
% difference * -0.5
The third row of the second column of percent changes is calculated with:
% difference * 0.5
The fourth row of the second column of percent changes is just % difference.
Therefore, the first “% change” column label should say “% difference”, and the second “% change” column label should say “WTF”. I also have concerns about whether percentage difference is what should be used here. My other criticisms still stand.
And the problems with this table are far from over.
One thing that stands out is the fraction for “All hot vegetables” is not the max value in each column. This is actually explainable if you assume that each vegetable is not served on each day. For example, if a popular vegetable such as broccoli is only served 70% of the time you could easily get the numbers they report.
What is not explainable are the standard deviations they report.
I previously made a historical discovery for variances and standard deviations, so I’m something of a standard deviation connoisseur.
The first thing that jumps out is that the standard deviations are far larger than the fractions. At first I thought maybe they took the fraction for each day (there were 20 days per cell) and found the standard deviations with those 20 values. But below the table it clearly states: “Each child-day is treated as a single observation.”
Hmm, okay, so how do you get a standard deviation for a fraction? When you have a fraction, i.e. a recording of two possible outcomes, that is basically a binomial distribution.
Wikipedia tells us the variance of a binomial distribution is:
Var(X) = n*p*(1-p)
where n is the number of trials and p is the probability.
But this is the variance for the counts. We are interested in the variance for the fractions. To turn a count into a fraction you just divide by n.
What we are looking for is:
Wikipedia tells us:
Var(a*X) = a² * Var(X)
Sweet. So all we have to do is divide the variance by n². And to get the standard deviation we just take the square root of the variance.
Okay, so what’s the n? Of course they don’t tell us the n for each cell.
In the text they say the “study included 40,778 total child-day observations, with roughly half in the treatment group”. So I guess we can suppose each cell has around 10,000 observations since there are two months in each group, although as I mentioned above, it seems not every vegetable was served every day, so the number of observations for the bottom 3 rows could be less than 10,000, and indeed some rows must have less observations than the first row for the fractions to make any sense.
Using this formula:
SD = root(Var(X/n)) = root(n/n²*p*(1-p)) = root(1/n*p*(1-p))
with n=10,000 we can show that the standard deviations reported are off by a factor of 100!
Let me say that again. Off by 100X!
The fact that we assumed the n is 10,000, which happened to provide numbers that were 100X smaller than their numbers, makes it easy to see the mistake they made. It is clear they are using this formula for their standard deviations:
That formula reproduces all of their gobbledygook, except for Control group, Month 2, Broccoli. The fraction reported there is .018, and the same fraction is reported in the first column and first row, and yet the SDs are different. They can’t even consistently report incorrectly calculated values.
Perhaps convenient for them, their values are off by around 100, which could theoretically allow them to claim their standard deviations are standard deviations for percents instead of fractions. However, as I said before, the rows must have different sample sizes for the fractions to make sense, and as a result the SDs should not be consistently 100X larger than the SD obtained for a n of 10,000.
Unfortunately, we’re still not done. Interestingly they mark all the percent changes as statistically significant except for the last row. They state: “Significance based on an F-statistic of differences in percent”.
I’m not quite sure what statistical test they are using, but you would think that with sample sizes around 10,000 per group any difference would be statistically significant. I ran some simulations, and it seems pretty clear the changes in the last row should be statistically significant regardless of what test they are using. The only way they might not be significant is if the sample sizes actually aren’t that large, which could only occur if carrots were rarely served.
In this post I only focused on the mathematical impossibilities in the two tables of an important paper funded by the USDA. Not surprisingly, the text of the paper contains numerous other inconsistencies. For example, in the abstract it is stated that the number of children in Study 2 is 1,017, but in the text the number changes to 1,552.
In addition to these errors, an entire post could be written just on the inappropriate methodology and statistical tests used in this paper, such as assuming independent observations when in fact the same students are having their choices recorded each day, or employing a high school student to carry out Study 2, who presumably was not blinded to the expected outcome of the study. (I can’t help but wonder if this lab also gets high school volunteers to do their stats for them.)
This is the 11th paper from this group that we have found with mathematical inconsistencies. Many errors appear to be caused by incompetence. But can incompetence really explain all of the errors? Let’s take another look at the first table.
Does anyone else find it strange “number eaten” and “number uneaten” always add up to whole numbers?
Of course it’s impossible to know how this happened, because they never provide any raw data or code, and when you request data they deny your request.
How many more papers with problems do we have to find before something is done? If this is the type of work Cornell endorses at a minimum I suggest we take any work that comes out of Cornell with a grain of salt. Even Cornell News agrees:
P.S. I have to acknowledge Nicholas Brown, Eric Robinson, Tim van der Zee, and James Heathers for their contributions to the presented investigation.
Hopefully this is the last paper from this group I have to look at; consuming these papers gives me indigestion.