Learning From Our (Statistical) Mistakes
Chunyang Ding | Jan. 24, 2016
Everyone makes mistakes; the only difference is in the size of the mistake. For a student, a mistake on a final exam can lead to a lower score, but stronger understanding of the material. For a doctor, a mistake in the operating room could lead to permanent harm to another person and the subsequent malpractice suit. And for NASA engineers, a mistake on the launchpad can lead to an explosion of a spacecraft. For people in different roles, mistakes have many different meanings. While some errors serve to teach ideas, other accidents can lead to damaged reputations or financial loss. Typically, the effect of the mistake depends on the person who made the error and each profession deals with the repercussions of mistakes though their own procedures. But what happens when a researcher, whose job is to experiment and push the boundaries of knowledge, makes an honest mistake in a scholarly paper? How should they, and the rest of the academic community, react?
Christopher Thorstenson, a researcher at the University of Rochester in New York, published a fascinating psychology study last September in Psychological Science, a highly prestigious journal for psychologists. Quickly picked up by news outlets like NPR and the Huffington Post, Thorstenson’s research found that people are worse at perceiving blue colors if they have recently witnessed a sad event. He conducted this research by first showing subjects a video clip of either a comedian, inducing laughter, or of the Lion King, when young Simba watches his father fall to his death by stampede, inducing sadness. Afterwards, Thorstenson asked the subjects to complete a color perception exam.
This research captured the imagination of many people, as “feeling blue” is a common expression associated with sadness or depression. However, shortly after the paper was published, online academic commentators began to dissect the statistical analysis of the original data, finding that the authors committed a statistical error. Facing criticism, Thorstenson quickly published a retraction notice noting the error, with a pledge to “conduct a revised experiment that more directly tests … and improves the assessment of BY[Blue-Yellow] accuracy”. Still, many academics openly mocked the journal, raising questions of how the flagship psychology journal of the Association for Psychology Science could let such a simple statistical error slide.
Before we tackle the messy realm of what ought to be done in academic retractions, let’s examine Thorstenson’s error and his following retraction. In his experiment, Thorstenson assessed both the Blue-Yellow and Red-Green bands of colors. In doing so, the experimental group discovered that there was a significant decrease in the subject’s ability to detect Blue-Yellow for those in the “sad video” group. However, the experimenters were unable to show that there was a real link, or causation effect. As psychologist Tom Stafford puts it, “If you can prove that one suspect was present at a crime scene, but can’t prove the other was, that doesn’t mean that you have proved that the two suspects were in different places.” In addition, many of the subjects provided poor answers in establishing the baseline color sensitivity, and were probably unreliable in their responses. Even after excluding those unreliable subjects, the experiment showed the same results, but the errors were large enough to warrant retraction based on statistical issues.
Thorstenson’s errors are rather subtle, and it’s easy to understand why they were overlooked. For academics who do not specialize in statistics, most of the statistical methods used in experiments are learned through mentors or otherwise in informal settings. In fact, in 2011, an article published in Nature Neuroscience conducted a survey of other experiments published in reputable journals. Among 513 different experiments, 78 experiments used this same statistical analysis correctly, while 79 experiments made a similar statistical mistake. (The remaining experiments did not need to use this kind of statistical analysis). Clearly, this reveals some troubling questions. Statistical analysis has been a major tool that allows scientists to discover correlations and interesting patterns, but it can sometimes lead to hasty half-baked ideas. It is thus crucial for scientists to both better study statistical analysis as well as for journal reviewers to identify commonly made mistakes before the articles are published.
These statistical errors are especially disappointing because the public expects perfection from the scientific community. However, when mistakes happen, scientists need to know how to properly rectify their mistakes. In fact, Thorstenson’s method of handling his mistakes should be praised, not condemned. After learning of this error, he published the retraction notice to Psychological Science, notified the respective editors of the magazine, and assumed all of the responsibilities that might fall upon him. Because of the way that he made his data open-source, outside commentators were able to help him find the error in the first place. In the retraction, Psychological Science’s editor, Stephen Lindsay, made an additional note, stating: “Although I believe it is already clear, I would like to add an explicit statement that this retraction is entirely due to honest mistakes on the part of the authors.”
It was clear that the Thorstenson did not purposefully skew the research in one way or another. Instead, it was an honest mistake made due to the intricacies of statistical analysis, and Thorstenson quickly owned up to it. Due in part to the changing nature of science peer reviews and in open source data, the scientific community was able to identify and correct the mistake in a matter of weeks, rather than going unnoticed and feeding the scientific world incorrect information.
What does Thorstenson’s journey tell us about the future of retractions in scientific journals? First, we must recognize that graduate students are frequently pressured to publish positive results. In the “publish or perish” environment, the reward of a master’s or a doctorate degree could depend on how many prestigious journals you authored during your time at the university. In addition, it has grown increasingly clear that these journals have a clear bias towards positive results, rather than inconclusive findings. In David Freedman seminal article, “Lies, Damned Lies, and Medical Science”, he remarks on the financial and institutional pressure for medical scientists to find positive drug results and to suppress the negative ones. In fact, as Freedman remarks, “being wrong in science is fine, and even necessary — as long as scientists recognize that they blew it, report their mistake openly instead of disguising it as a success, and then move on to the next thing, until they come up with the very occasional genuine breakthrough”. If scientists fully adopted this mindset, and if journals created more fair standards for publication, then perhaps we could see the execution of more honest scholarly work, and a larger willingness to tackle difficult problems. As negative results are sometimes just as informative as positive ones, we could see a more unbiased pursuit of the truth.
Chunny Ding is a freshman in Saybrook College. Contact him at chunyang.ding@yale.edu.
(Featured image courtesy of Wikimedia Commons.)