Cornell’s Alternative Statistics, part deux
My colleagues and I were recently asked to tabulate all the problems we had found with work from the Cornell Food and Brand Lab, which at this point is quite the buffet of errors, poor methods, and data recycling. Not surprisingly, while looking over what we had found I noticed yet more problems. I found one particularly juicy morsel since it tied in nicely with a previous post of mine.
The new entree was noticed in my “The Donald Trump of Food Research” post, whose title has aged extremely well (suck it tone police).
The paper in hot water?
“The Flat-Rate Pricing Paradox: Conflicting Effects of ‘All-You-Can-Eat’ Buffet Pricing”
The data in question?
What’s the problem?
They reported a standard deviation for percents again!
If you recall, in “Cornell’s Alternative Statistics” I tried to understand the SDs they reported for percents. I came to the conclusion that they seemed to be using an incorrect equation for a binomial distribution. My colleague however came to the conclusion they were coding their data as 1's and 0’s and reporting the SD for that, which is silly, but nonetheless I agreed it was more likely for these authors.
But now we can find out for sure! Isn’t that exciting?
Now, I’m not a statistician, but I learned back in high school biology that when you have a categorical variable you should use a Chi-squared test. But the table clearly says an F-Test was done, which would suggest the authors actually think SDs for percents are a thing, and that they used those to calculate their F-statistic.
Be that as it may, let’s be generous and start from the assumption that the authors are not silly enough to perform a one-way ANOVA with percents. Let’s assume they did perform a Chi-squared test, but just forgot to put a note in the table. The article doesn’t mention Chi-square anywhere, and if a Chi-squared test was done there would be no reason to report SDs, but hey, we’re being generous.
Okay, so let’s go to a Chi-square calculator and get our Chi-square on! All we need to do is type in the number in each category…
Houston, we have a problem. There’s a granularity error so we don’t know how many men and women there were for the regular priced buffet!
There are 35 diners for that group.
30/35 = .86
There is no number of men that results in 85%, unless we assume someone was only 3/4 a man.
Okay…so now what? We clearly can’t take the percents and resulting SDs at face value. So I guess we’ll just assume they calculated the test statistic correctly — a big assumption with this group, but we have to start somewhere.
If they did a Chi-squared test does their statistic match their p-value? I don’t know, let’s ask Python. We’ve got a 2 X 2 table, so we’ve got (2–1)*(2–1) = 1 degree of freedom.
Ha! The p-value is 0.24 instead of 0.25.
But wait, we didn’t take into account rounding error, let’s see if we can raise that guy up a bit.
Nope, we can’t get to 0.25.
Okay, so does the F-statistic check out?
Yes, yes it does. These authors legit took a categorical variable, coded it as 1’s and 0’s, then ran a one-way ANOVA. If that isn’t alternative statistics, I don’t know what is.
As mentioned in my addendum below, and as Matt Williams points out in the comments, apparently using an ANOVA for a binary variable is not as silly as I make it seem. It is not hard to find people who agree with me, but it is also not hard to find literature supporting the use of an ANOVA under certain conditions. I think most people would agree that logistic regression or Chi-square would be better, but using an ANOVA test is not completely ridiculous, and may be common in certain fields.
It is interesting to point out that this group decided to use a Chi-squared test in “How descriptive food names bias sensory perceptions in restaurants”, so it is unclear what their stance is on ANOVAs with binary variables — sometimes they use it, sometimes they do nothing, and sometimes they do a Chi-squared test. My feeling is that I just spent more time thinking about the appropriate test to use on a binary variable than the authors ever did, and that they just accidentally calculated an ANOVA here, which happens to be an okay approximation of better statistical methods, and might be acceptable in certain fields.
This post was not meant to contain statistical advice — as I said, I’m not a statistician — but I guess if you decide to use an ANOVA on a binary variable you might want to be aware that it might raise some eyebrows for people in fields where this is not common, and might cause the reader to take a closer look at your statistics, and find other more serious problems such as the impossible percent of men reported in this paper.
It might not be apparent why this is so silly, because when you have two categories regardless of which one you arbitrarily label 1 and which one you arbitrarily label 0, the SD will be the same. But once you have three categories and you decide to label them 0, 1, 2, how you label them will change the SD. You are basically playing God with the statistics. I don’t remember being taught that you can play God with numbers, but maybe that’s the kind of advanced stuff you learn at a prestigious university like Cornell.
The funny thing is these authors know better, or at least the Turkish grad student does. In “Lower Buffet Prices Lead to Less Taste Satisfaction” they did not report SDs for percents and knew to not calculate an F-statistic.
Actually, that can’t be it, because as Wansink implied, the Turkish student came from a “lessor[sic] school” and a “lessor[sic] background”, what could she know? Actually, what could I know? I only went to a public university, and not a prestigious private university like Cornell that represents the pinnacle of scientific investigation. And my family isn’t educated, so how could I be?
Clearly we should just trust everything the Stanford-educated professor at the Ivy League institution says, just like we should just trust our president. Even though numbers might appear to be mathematically impossible, our maths can’t compare to their maths, they have the best maths. They have the best maths ever, and best data ever. Data so good it can’t be shared. We would not believe how good it is. It is too good to be true.
I’m trying to get out of the pizzagate business, but business is booming.
ANOVAs have several assumptions about the underlying data, and a categorical variable clearly violates these assumptions. In the specific case where you have a binary outcome, such as gender, my colleague points out that there has been some research on when it is okay to use an ANOVA instead of a more appropriate test such as the Chi-squared test. Perhaps the Cornell Food and Brand Lab is aware of this work, and decided an ANOVA was appropriate for their data. However, if that is the case, then it is unclear why they decided to not report an ANOVA value in “Lower Buffet Prices Lead to Less Taste Satisfaction” for the exact same measure (gender percent) when they had a larger sample size and more normal-like distribution of values.