Worst correction ever

Jordan Anaya
9 min readJul 5, 2017

--

If you’ve been following Cornell’s pizzagate you may know that a few corrections for the dozens of papers that we’ve flagged are starting to appear.

Andrew Gelman recently discussed one, and Retraction Watch followed up with a report. However, the paper in question wasn’t one of the pizza papers, it was just a paper I flagged in this post and subsequently contacted the journal about.

Sure it looks bad that the authors can’t find the data for the paper, but it is an older paper and contained relatively minor issues compared to some of Wansink’s other work. It is to the journal’s credit that they performed a prompt and thorough review of my concerns.

***Of course, given that the original paper did not contain data or code it is impossible to know how serious the problems are. When I say that the problems are minor, I am simply referring to the problems that can be identified without access to the underlying data, which is not that many, although that is likely just because the paper didn’t contain many numbers. It is entirely possible that the issues with the paper are extremely serious, even retraction worthy, but we’ll never know because the original data no longer exist.***

So what about the pizza papers? Those papers had hundreds of numbers that didn’t make any sense, certainly those corrections must be extremely extensive, right? Haha, don’t be silly, I wouldn’t be blogging about this if they were.

Of the four pizza papers there have been 2 corrections issued, with the other 2 journals indicating that corrections are under way. The first correction came in the form of an Editorial Note by Todd Shackelford. That note was an insufficient response, to say the least, and resulted in us writing an open letter. I do not know if another, hopefully longer, correction will be issued since the journal has not replied to my emails.

The other correction comes in the form of a corrigendum attached to the paywalled pizza paper, “Peak-end pizza: prices delay evaluations of quality”. This correction confirms all of my concerns about what Cornell would send to journals.

What were these concerns? I suspected Cornell would only address the bare minimum of issues that we raised in our original critique, and try to make it appear as if the problems were minor and didn’t affect the conclusions of the articles.

Where did this pessimistic view come from? Well, Cornell released a response to our initial critique that set the tone. Although the lab provided the data set and STATA code for the reanalysis, the response only showed corrections for the tables we flagged, and some response tables didn’t even reflect what was done in the reanalysis, opting to instead just list what sample sizes were used in the original analysis.

The question then became: what would they send to the journals? Would they just provide the journals with the sample sizes used in the original analyses, or would they provide the output of the STATA code which for the most part correctly excludes problem diners? It appears we now have our answer, and it fits with everything else we’ve seen from this lab.

Luckily, I recently performed my own reanalysis of the released data set, which I confirmed to be accurate by comparing my results to the STATA ouptut, which had to be provided by colleagues since Cornell made their scripts, but not output, available. So I’m in the unique position to measure just how tall this stack of bullshit is.

This correction actually needs a correction, actually several. It states:

A full script and log file can be found here: https://doi.org/10.6077/J5CISER2783

That link just takes you to the data set and a setup script. The link they should have provided is: http://ciser.cornell.edu/ASPs/search_athena.asp?IDTITLE=2778

However, even then the statement wouldn’t be accurate, because all that is available is the STATA script, no log file. And as we’ll see, the output of this STATA script isn’t even reflected in the correction. As a result, the code used for the correction isn’t available anywhere, despite the correction saying it is.

Interestingly, the correction states:

the authors have sought the independent feedback of a researcher at Mathematica Policy Research who has in turn reviewed the text, tables, and
Stata output contained in this correction for consistency.

If this researcher was looking at the STATA script at Cornell’s website and the tables contained in this correction, this researcher needs to be fired. I have made the STATA output publicly available, and Tables 2 and 3 of the STATA output are different from the tables in the original article, and consequently, the correction.

The correction also states:

These analyses focused on pizza, therefore diners who did not report eating at least one piece of pizza were not included in the analyses.

That seems like a fairly innocent statement to make, the article does rely on pizza ratings, so these people must have eaten pizza, right? Well, let’s take a look at the distribution of pizza slices in the released data set:

There are people who claimed to eat 0 pieces (and 6 diners not shown who did not report how many pieces they ate), which is fine as long as they weren’t included in the study, but were they?

Let’s take a look at Table 1:

This actually isn’t the table in the correction, this is my reproduction of the authors’ Response Table from their response to our original critique. I don’t want to screen cap a copyrighted article, what with Elsevier suing for $150,000 per article. After all, this article does cost $32.00 to view:

***Note: the Response Table has some incorrect values which are correct in the correction, so kudos to them for that, I guess.***

Okay, so why did I bring up this table? Well, if you look at the sample sizes you will notice that one row has 136 diners in it. There are only 139 diners in the study. Given the pieces of pizza distribution, some of these 136 diners had to have eaten less than 1 piece of pizza, which is in conflict with the statement in the correction about every diner eating at least 1 piece. In fact, all 139 diners are included in this table.

Okay, maybe Table 1 is meant to be just an overview of the entire data set, not an overview of the diners that are actually included in the study (which would be odd, but it’s best to always give the authors the benefit of the doubt, and these ones need it).

To resolve this we can look at the other tables. Here are the sample sizes for Table 2:

The sample sizes are smaller than Table 1, so theoretically this table could exclude diners who ate less than 1 slice. In fact, the STATA code removes diners who reported eating 0 slices (but not diners who did not report how many slices they ate). So let’s take a look at the STATA output sample sizes:

Ah, they are different.

Okay, so the STATA code did more than remove diners who reported eating 0 slices, so not all of these differences are proof that diners who ate 0 slices, or did not report their number, were included in the original table (and consequently the correction table). However, I wrote my own code to reproduce the original statistics and STATA output, and it is clear the original statistics (and correction tables) included diners who ate 0 slices. So the statement that this study only included diners who ate at least 1 slice is contradicted by the data set.

The correction briefly touches on Tables 2 and 3, stating:

Minor differences in rounding were found for Tables II and III.

It is interesting that only rounding issues for these tables were addressed when the STATA script they provide gives results completely different than the reported tables.

This gets back to my hypothesis that the authors would only try to address the bare minimum of the problems. In our original critique we didn’t flag any problems with these two tables because we couldn’t check them without access to the data.

To the authors’ credit, they did release the data set, and did provide STATA scripts for reproducing what they believe are the correct analyses, but it is troubling that they didn’t make any mention that the original analyses were in error, or make any effort to correct them.

My initial suspicion was that the authors knew that we couldn’t check Tables 2 and 3 with granularity testing, so they thought they could get away with not pointing out the problems in the tables. After all, they didn’t provide the output of the STATA script, so the only way for someone to discover the issues with these tables would be to either run the scripts themselves or write their own code. Unfortunately for the authors, I did both (technically I got other people to run the STATA script).

So what are the problems with the tables? As discussed in detail in my critique of their response, the original analyses included diner responses which were impossible, such as diners who reported ratings for pizza but reported they ate 0 slices.

However, if my initial suspicion was correct why did the authors correct rounding errors in the tables? And how did they notice the rounding errors? The STATA script doesn’t reproduce the numbers in these tables, so what script are they using that reproduces the original numbers?

Every time I think I have pizzagate figured out there’s a new loose end.

The “corrected” article also still contains numerous other inaccuracies that the authors didn’t bother to correct, point out, or perhaps didn’t even notice.

Here’s a few examples:

  • The article states the study took place in spring, but the data release states the study took place from October 18 to December 8
  • The article states the study was only two weeks
  • The article states the modal number of slices taken was 3
  • The article states 8 diners ate alone
  • The article states 52 ate in pairs
  • The article states only the consumption of pizza was measured

Another thing I noticed while performing my reanalysis of this paper was a problem with the references. Most self-references are not actually cited in the text of the article.

For example, all these references occur in the reference list but are not cited in the text:

  • Kniffin, Kevin M., Sigirci, Ozge and Wansink, Brian (2016), “Eating Heavily: men Eat More in the Company of Women”
  • Just, David R. and Wansink, Brian (2011), “The Flat-rate Pricing Paradox: Conflicting Effects of ‘All-You-Can-Eat’ Buffet Pricing”
  • Just, David R., Sigirci, Ozge and Wansink, Brian (2014), “Lower Buffet Prices Lead to Less Taste Satisfaction”
  • Shimizu, Mitsuru, Payne, Collin R. and Wansink, Brian (2010), “When Snacks Become Meals: how Hunger and Environmental Cues Bias Food Intake”
  • Van Kleef, Ellen, Shimizu, Mitsuru and Wansink, Brian (2013), “Just a bite: considerably smaller snack portions satisfy delayed hunger and craving”
  • Wansink, Brian, Payne, Collin R. and Shimizu, Mitsuru (2010), “‘Is this a Meal or Snack?’ Situational Cues that Drive Perceptions”

I’m not sure how this sort of thing happens. Every paper I’ve ever written either used EndNote or LaTeX, so it’s impossible to have a reference in the reference list that isn’t cited in the text.

Okay, so perhaps these authors are just sloppy with their references, they’ve displayed questionable practices before. So I checked the other references which weren’t self-references. I didn’t see any issues.

Did they just add a bunch of self-references that needed a boost in citation count and just forget to cite them in text? How did the copy editors not notice this? Why does it cost $32.00 to view this article when the reference list wasn’t even checked for accuracy? Where is that money going?

This whole saga has started to make me wonder if paywalls are actually a good thing. If Wansink’s work is indicative of the field of food research, then the output of that field should be behind as many layers of paywalls as possible. I’m talking Attack on Titan levels of paywalls:

It should cost $32.00 to view a web page that then charges $320.00 to view a page that then charges $3,200.00 to finally view the article. No one should ever have to read this work.

--

--