Pale And Wan, We Sink
The last brick in a long road?
PART 1: RECAP
There are a few finches, living in the highlands of Northern Papua, who are yet to hear that Professor Brian Wansink has resigned.
It is, however, just these finches. All other sentient creatures have been informed, due to blanket coverage of this event in Time, Science, WaPo, MoJo, Fake News, The Pedantic, Ars, The Chron, National Public Whitevoice, and — of course — Buzzfeed (who broke a crucial aspect of the story as it was developing).
The chimps and the butterflies and the people from Kentucky and the kangaroos and the whippets are all very well aware at this point. Some of them may have even had time to forget, as a great deal seems to happen in the meantime these days.
It is just those blissful, solitary finches who are unaware, scratching stones and pecking grubs* as usual, that this particular donnybrook is taking place.
As you are not a finch, I will not waste time with the minor details of the case unless I have to. Any of the above links will do if you need to check the ballistics. Let’s get straight into why I’d bother writing about it, and then perhaps we can talk about what it means.
My initial reaction to this case was to deny there was any kind of problem. This would be late ‘16.
Yes. I was a skeptic. I was on Team Wanfloat.
Because I knew the research involved, somewhat. It was unfamiliar to everyone else (which is Nick, Jordan, and Tim), but I’ve spent enough time in and around nutrition and dietetics to recognise some of it instantly.
I simply did not have the imagination to believe it could go so wrong.
In the age of computational biology, multi-level modelling deployed to explain anything more complicated than a teacup, big data rapidly marching towards massive data, and frequently feeling scared of the computational skills of people who are ten years younger than me, it had never even crossed my mind to think that research whose results were akin to “Group X eats 34% more than Group Y under Condition Z” could be done problematically.
At end-2016, of course, this discussion was subsequent to what might go down as the most spectacular self-own in scientific history, where Prof. Wansink wrote a blog post that straightforwardly outlined the way to get ahead in science for young researchers — analyse the bejezus out of a dataset with a big stick until it gives you something that looks right.
This is, of course, dreadful scientific practice. It was sufficiently bad that a lot of people thought it was a parody. The shit hit the fan immediately. A variety of words were said, all of them negative, and most of them at a volume sufficient to shake fillings loose from molars.
This posed the immediate question to the Human Scum — if this blog post contains a sincere description of their research practices, how reliable could their research be? There were four papers central to that blog post. Nick, Jordan, and Tim read them, and found the most extraordinary array of inconsistencies. The figure that was bandied around a lot at the time was that “150 errors” were located in four papers.
That’s not the whole truth, rather it’s far more accurate to say ya bois stopped looking once they got to 150. There probably weren’t many more errors, but it became impossible to determine — the studies discussed in the blog post reported layers upon layers of astronomically incorrect numbers, huge rolling waves of inconsistencies. Like the Old Testament, mistake begat incorrect sample size begat impossible mean. They were a complicated inter-connected web of oddities, fractal bollocks.
The number-checking procedure itself was conducted, in part,via an extremely simple test Nick and I published for determining if reported summary statistics are consistent with sample sizes.
Then The Fun** Started
On the back of this initial effort, in which I played only a very minor role, I realised (a) my initial conception was really, really wrong (b) Something Was Indeed Up.
At this point, I became a lot more interested in not whether or not summary stats were internally consistent, but what they amounted to — what distributions or gremlins they hid. This was the impetus behind what became SPRITE.
As an introduction to this technique, I wrote a lot. The following covers 5 papers including Prof. Wansink as an author, with a general introduction to SPRITE, some other fresh number muggin‘, and a quick explanatory note.
And, without any reference to SPRITE, there was later a curious amount of fuss centered around a Wansink-authored Archives of Internal Medicine letter that claimed (utterly without basis, in my opinion) that the Joy of Cooking — the American gold-standard cookbook — was a standard-bearer in the obesity epidemic through its promiscuous increase in calories per serving since the 1930’s.
I’m rushing. The above took about a year.
And you better believe that publishing any one of the above requires a serious commitment to public accuracy. They aren’t potshots. As much as the language might be casual, and there may be more than the occasional horse pun, they’re serious, reasoned, justifiable criticism.
Which means it took me ages.
But it’s not even close the bulk of the work that took place elsewhere.
Brian Wansink is professor at Cornell University and is a high-profile researcher with an impressive track record. He…www.timvanderzee.com
I think I analysed… perhaps two other papers out of the FIFTY mentioned here. It was primarily Jordan and Nick who picked up the slack on the remaining 42, and Tim somehow got the unenviable task of organising it into the above. If you’ve ever tried checking every comma, every number, every test, et al. of a paper, you’ll know it’s a garbage job which takes forever.
Doing it 50 times, in the middle of the night, on your own, for no money and no reward, while frequently being told (sometimes by people who should know better) that you’re a bully, a ruiner, and a scoundrel is bordering on pathological.
So why bother in the first place?
For me, because it was like a sore tooth.
Scientists make compromises all the time. We understand publication inflation, but we feel like we have to keep pace with it. We understand bad incentives, but we are subject to them. We understand that super-journals are problematic, but we agitate like hell to get our students to publish in them. We understand publishers are landlords in greasy topcoats, but we shovel their coal. We understand that funding is more useful spread equitably, but we continually ask for more money to do our own research.
We’re all hypocrites. Me, you, a dog named Assistant Professor Boo.
But more by circumstances than by design, occasionally we come across a compromise we can’t make.
There was a documentary about ten years ago about the birth of American hardcore punk. In it, Vic Bondi (he’s the vocalist from Articles of Faith) was talking about hardcore as a violent reaction to growing up as a poor working-class scumbag in early Reagan America.
I can’t find a video, so I have to produce this from memory. However, it will be accurate enough because when I heard it, the phrase became instantly welded to my cortex.
Somebody has to say ‘this isn’t right’. Because everyone else is saying “It’s morning in America!”. Somebody’s gotta say “IT’S FUCKING MIDNIGHT, MAN!”
That’s what happened to me. [***]
So, in the aftermath of all that, very little visible happened. The work was done, the points were made, and the usual vacuum descended.
Then, to my lasting surprise, Cornell conducted their own entirely independent investigation into this body of work which ended recently. Like our own investigation, it took about a year, and they reached what I’d describe as a ‘similar’ conclusion (including a few other curly-looking bits and pieces of misconduct we’d never even considered). That, in particular, was a surprise.
Why surprise? Because universities often treat senior professors like assets, and move to protect them under a variety of circumstances (research malfeasance, sexual harassment, financial irregularities) where they bloodlessly calculate a risk/return ratio on the problem in question.
In such cases, the desired outcome balances the following:
- some idea of integrity (custom more honoured in the breach etc. etc.)
- the reputational loss to the school of said academic staying or going
- the loss of grant money/awards/other funds
- the hole created by removing a dean/lab head/committee XYZ member
- the legal basis they have on which to act (you can’t just hoof out a tenured faculty member if they pocket a box of paperclips, you need to have cause),
- the litigious/difficult nature of the faculty (and BOY does this vary), and
- the demands of employees/staff/funders etc. with regards to the issues in question.
As the circumstances vary, so do the eventual outcomes. A bloodless calculus is performed, and a decision is made by crisp chaps in new cardigans. Then, something approximating action sometimes happens. Academics can be given ‘the quiet word’ and unofficially asked to leave, or put in an untenable situation from which they will inevitably have to resign (for instance, if you’re barred from receiving NIH money but it funds your work, or you’re barred from meeting students but you have a teaching position), or very occasionally actually really properly removed. Other punitive measures also vary.
And — of course — investigations ending quietly and concluding that everything was in fact not that bad, and handing out the academic equivalent of 100 hours of community service, that happens too.
That’s what happened in Cornell’s first investigation, actually.
So, points of wrap-up. More accurately, dot points of wrap-up. There’s probably a narrative to stitch them together, but I am utterly unable to bring it to bear at present. Sorry.
- What’s not been discussed so far is the role of the government and in particular the USDA, from whom Prof. Wansink received a great deal of money and support. Governmental organisations that fund research (DARPA, NASA, DHHS, USDA, et al.) have different funding structures, different record-keeping requirements and different oversight measures to grant bodies. In my limited experience, these are all more stringent than grant agencies. They also do NOT tolerate people sodding about with their money. So far, we haven’t heard a single muttered word from the USDA. However, (A) I sincerely doubt they’re ignoring this issue and (B) they’re generally even slower than university administration. Perhaps there are further steps yet to happen.
- The resignation of a named chair with a massive public profile subsequent to a misconduct investigation is a big step. Without knowledge of what conversations took place — we’ll never know how much this step was ‘encouraged’ — it’s hard to say what message was delivered. I wonder if the phrase ‘jump or be pushed’ was mentioned.
- While Cornell has obviously conducted a serious process here and reached a conclusion of misconduct, they’re hardly blameless. Certainly they have had no contact with any of us whatsoever, and we are not mentioned at any point in any official document. They also conducted an initial investigation which concluded that everything was just fine, thank you very much (although it undoubtedly examined different issues to its subsequent longer brother).
- The above media releases mention, in part, a 160 page investigative report which has been written and not released. Instead, we got a rather anaemic summary statement which only partially overlaps with issues we discovered. There is a certain irony to an secret investigation which finds a variety of problems that could have been solved by a lot more transparency. Perhaps we will yet see this document. GOD, I want to see it.
- The forcing function here was the JAMA journal group. Their own internal investigation, which resulted in six expressions of concern followed by six retracted papers, landed like a comet about three weeks back. JAMA obviously took this problem seriously and acted, with no external prompting. Seems like they were aggressively proactive. This doesn’t happen often.
- And, the thing I really want to talk about…
PART TWO: STRATEGY
Why take an aggressive approach to error detection?
I’ve been asked some version of this … maybe four times. And I mean really seriously asked.
“Why do you do what you do? I see all these positive developments in making science more open, reliable, transparent, and so on. But what you do seems destructive and aggressive in ways that other projects don’t. Why?”
Easy enough to answer, actually. In increasing order of importance.
- How I’m wired (less important). Finding errors is interesting to me at some level, as much as it’s often incredibly boring to do. I have six to ten puzzle games on my phone at any given point in time. I do crosswords with my wife. I will ruin you at Scrabble.
- What I believe (important). The best way for science to be seen as trustworthy is to be trustworthy. You might have noticed we have a minor image problem. This includes never being seen to clean house when problems in the academic process occur, and the lack of reckoning with the consequences of doing so. I honestly think this is an issue you can get in front — you hang out a big shingle under a bright light, and you say WE’RE GOING TO FIX THIS, AND THERE WILL BE CONSEQUENCES. This work is central to science staying at the center of public understanding as an enterprise which improves.
This is why I loathe the jeremiads about science is self-correcting, forced between smug lips, from someone who never corrected a single question mark. I’ve heard a hundred whispers of cynical, ruinous scientific processes which go into a global enterprise that runs a very real risk of undermining itself. We love talking about trust in the publication process, trust between scientists. Well, the trust of everyone else and the people who run the country is more important.
- How I plan (most important).
^ This gets its own section.
I don’t know if I’ve ever written this down before, but specific investigations into specific problems have one very very good thing going for them:
I don’t have to sum it up neatly, because Andrew Althouse managed to fit it into a tweet.
And now the dust has settled, you might be able to see the center of all this work, see beyond the error detection techniques and endless documents and fuss. The oxygen. Sauron’s Eye turns in full focus towards scientific reform.
And the Eye does not like what it sees. What happened in this case is not a narrative you can avoid, because it should not be possible. It’s like finding out someone without an amygdala can have a panic attack — OK, if that’s possible, then our model of understanding how panic works cannot be accurate. We have one case that makes us seriously question our whole model.
In other words, if it’s possible for someone to reach the absolute pinnacle of the academic pyramid, only to be found out to be as reliable as an unrestored 1977 Yugo, what does that say about our ability to understand good vs. bad science? What does it say about our priorities in rewarding people?
This is the point where we hear the ‘bad apple’ argument.
None other than Diederik Stapel, who no doubt caught a few unflattering comparisons to the quality/produce continuum, went in hard on the bad apple argument.
It was just one guy. What could it possibly mean about everyone else if we’re talking about the actions of just one guy?
The answer to this is simple.
(1) see above — the fact that it could happen at all, apple or otherwise, is deeply problematic. It’s not one guy, it’s ten thousand decisions or assessments or checks which failed to point out obvious problems with one guy.
(2) what would you say if it isn’t one guy?
Thought experiment time: say we could find a dozen Wansinks, people at the top of the academic ladder whose whole careers were littered with mistakes that invalidate their work, sitting in plain sight, their lives utterly unaffected by the extremely questionable quality of their scholarship.
Say we could find thirty.
Say we could find a hundred.
Do you want to bet that won’t happen?
When does the rain of bad apples become a problem with the tree?
Even our original dozen cases above would generate a heat haze visible from low orbit. And a similar level of misconduct in other fields would be worse. More consequential, uglier, wasting more money, killing more people.
Say the Wansink case, but in ecology. Dead whales.
Say the Wansink case, but in cancer biology. Dead people.
Say the Wansink case, but in economics. I don’t even want to think about it.
These cases would explode like satchel charges across the academic landscape, carrying with them the undeniable message SOMETHING HAS GONE WRONG, something at the center of how we assess and reward science. We have built a system for establishing and building information about the world that allows not mistakes and garden-variety sloppiness to infest it, but bare naked incompetence and active dishonesty.
You can build better systems and encourage incremental reform, of course, and you should. You’ll do better at it than I will. You can change funding systems, you can address the problems with the publication process that are problematic, more common, and far more benign.
But, like all of us, you are a small voice in a long hallway. You need people to look, and to care. You need OXYGEN, breathing space, interest, attention, outrage, inrage, momentum. You need eyes.
Last note: remember a while back when we were called “vindictive little bastards” and “human scum” in a Chronicle piece, by an anonymous tenured coward? I do. It was hilarious.
Although I object to being called ‘little’. I am many things, but I am NOT little.
Well, the most consequential thing our tiny-hearted friend said didn’t get enough attention. It wasn’t all the half-baked insults, and the hyperbole about how stringent criticism is just a terrible thing (we might have to agree to disagree on that one). It was this:
Basically, someone who hates us still thinks error detection is valuable. Now, I’ll take that head and shoulders over any endorsement from someone that agrees with me. You’re the worst person in the whole world, but I agree you’re doing something useful.
Thank you, anonymous little coward. I love you too.
The black flag is still up.
The black flag stays up.
[*] I do not know what finches eat, and I’m not looking it up. They don’t look trustworthy and the less I know about them, the better.
[**] No actual fun was had at any point.
[***] Well, maybe it isn’t midnight. But, considering what we’re starting to find out about the health of the general scientific enterprise, it is at least a lot later in the day than we previously thought.