The Process is Failing the Goal

Eric del Rio
Human Systems Data
Published in
7 min readMar 15, 2017

Hi Folks,

In my last post I shared insights from An Introduction to Statistical Learning: with Applications in R and my journey through a First attempt at Regression using R (James, 2013).

This post is a little more philosophical in nature and looks at how certain parts of the scientific process might be undermining the basic goal we have as scientists.

The first reading is “Statistical tests, P values, confidence intervals and power: a guide to misinterpretations” (Greenland et al., 2016). This article talks about widespread issues in hypothesis testing and provides a very useful list of common misinterpretations of statistics during hypothesis testing. It also makes some suggestions about ways to avoid misusing and misinterpreting the meaning of your data when using statistics.

The second reading is an interview of Andrew Gelman and Eric Loken, by Alison McCook from Retraction Watch, about their recent paper “Measurement error and the replication crisis” (2017). This reading talks about the assumption if an effect size is found with a lot of “noise” or uncontrolled for factors, it is generally misinterpreted as an indication that the relationship is stronger — another issue contributing to the replication crisis. (If you are not familiar with Retraction Watch, it is a website that tracks retractions from journal articles as “a window into the scientific process”).

The third reading was blog post by Andrew Gelman, “What has happened down here is the winds have changed”. This post is a criticism of Susan Fiske’s article about how social media has had a negative impact on the scientific community because it bypasses the peer-review protocols and standards “Mob Rule or Wisdom of Crowds” (In Press). The post provides a very comprehensive history of the issues with research in the social sciences. The criticism of Fiske’s paper is that many of these problems resulted from research happening in a vacuum of political peer-review. According to the papter, better to have social media than the lack of transparency of scientific publications.

Why These Problems Matter

These readings get very specific about the problems that are occurring in scientific research. While I was reading, I got caught up and daunted by the specific issues, and it was only afterwards that I started to make the connections between the specific problems and how they relate to the general goals of the scientific pursuit.

Before I go into more depth on the readings, let me muse a bit on the “goal of science”. In a very broad sense, we use science for the purpose of better understanding things in our environment. Our reason for wanting to this is to answer questions ranging from simple curiosity (why do we dream?) all the way to survival needs (will this medicine cure cancer?). For answers from science to truly lead to better understanding, we need to objectively interpret what the answers are. Unfortunately, our desire as researchers to answer the really interesting questions has lead to biases in the way we conduct research. The problems discussed in the reading relate to how the scientific process has evolved in a way that we, as researchers, might be getting in the way of the answers we are trying to find with our research.

Hypothesis Testing is a Big Misunderstanding

Greenland’s paper is all about issues and misinterpretations related to hypothesis testing. Hypothesis testing is when you predict (or hypothesize) that your input or manipulation will cause a different outcome from the same situation without your input or manipulation.

Say you are racing cats, and you predict that giving some cats a candy bar will make them run faster than the other cats. You give them the candy, race them, time them. You then test the times of the cats against the assumption that they all ran the same speed. Your model is that they all run the same speed, and then you run an analysis to see if the actual speed between the two groups is different from the model.

This sort of method is used throughout the scientific realm, and has been found to have a variety of problems. First, according to the paper and many others, professional scientists systematically misinterpret the meaning of the statistical values they use to provide a yes or no answer to the question “do the candy bar cats run faster?”. Greenland provides 25 examples of common statements that misinterpret these statistical constructs (p-values, confidence intervals, statistical power). The dismay in of the writer is tangible, and each example of a misinterpretation is followed by a definitive “No!”.

The essence of the problem is that hypothesis testing is looking for an easy yes or no answer to a question, and in doing so may be overlooking the real value of the data as it can improve our understanding of our environment. Because we are designing the experiment, and are focused on answering our question by confirming or debunking our prediction, we may be missing the importance of the data. The data may answer other questions. The paper suggests that researchers should be more thorough in their research, and attempt to better understand the data beyond whether a p-value suggests that the difference is “significant” or not. For example, the answer to the question, “are candy bar cat’s faster?” is a lot more useful if you are sure that it applies to all cats, and if you can really describe how differently cats will race if they are given a candy bar.

Because misinterpretation of hypothesis testing analyses has lead to so much published research that put erroneous “yes or no” answers to some very interesting questions, it has undermined the “goal of science”. While I will definitely be using this paper to make sure that I am not misinterpreting or misrepresenting my own findings, the paper leaves me unsure of what scientific practices provide a useful replacement for hypothesis testing.

Measurement Error Can’t Be Repeated

The interview by Alison McCook talks about how measurement error (uncontrolled for factors that are not observed specifically in a study) acts as noise in psychology studies. “Noise” in this context means that it interferes with our ability to really observe what is the real relationship between our intended manipulation and the output. It takes many forms and is often unavoidable when you are studying people.

According to the authors of the paper, there is an assumption in psychology that if there are extraneous factors present that can’t be controlled for, and you still find a “significant finding”, then the relationship between the outcome and your manipulation is even stronger. They show through a simulation that this is not the case, and that when you use more participants the effect doesn’t carry over. The problem is that many researchers publish articles on findings that may have been influenced by a lot of noise.

On the bright side? A major replication crisis has recently blown a big hole in the social sciences and called into question a lot of research that uses confirmatory data analysis for hypothesis testing. This means that researchers will be motivated to conduct studies with less noise for fear of replication studies. The replication crisis refers to studies that have been designed to replicate previous studies, but do not come up with the same results. If a result has been found because there is noise, it is possible that the replication will come up with different results because the noise is different (so long as the manipulation is the same). The reasons for this crisis relate to many issues in research and statistics. such as: p-hacking, the file drawer problem, and that significance testing p values really shouldn’t replicate anyway. There are whole organizations devoted to dealing with these problems (Retraction Watch being one of them). Brian Nosek has started a huge collaborative replication campaign called Open Science Collaboration. They have found major problems with the results of trying to reproduce studies in psychology, and published a very influential paper in Science (2015).

Experimental “noise” undermines the goal of science because it results in experiments that claim to answer questions that they may not have answered. If the answer was actually a result of noise, we now have an incorrect answer and are misinformed about our environment.

Don’t Hide Behind Your Title

The third reading, “What has happened down here is the winds have changed”, is as hilarious as it is informative. It is in response to Susan Fiske’s article about how people on social media are attacking published scientific work, unrestrained by courtesy, reputation, or academic peer-review. Gelman basically outlines all of the problems with psychological research that have culminated in the criticisms it faces today: that it is not reproducible (replication crisis), that it is often erroneous, and that it’s research and publication methods need a drastic overhaul. He then argues that Fiske, because her career has been rooted in this flawed system, wrote this piece to resist the (flood analogy) change that must come. Her career worth of research is being called into question, she must learn new methods, and her and her colleagues who have been practicing these methods could be defamed.

I strongly agree with Gelman’s position: that a large part of the problem is that the verification of many published findings has occurred behind closed doors, with questionable caution, and with an emphasis on publishing alarming findings with “significant” p-values. I also do not think that the way we criticize science should be based on whether somebody will lose their tenure. However, I am resistant to the idea that scientific findings should be validated or invalidated, in a non-methodical way, by non-scientists on social media. That leads to other problems (global warming deniers etc…).

This is yet another area where, as humans, researchers are incidentally obstructing the real goal of science: to understand more about our environment. The concerns posed by these papers are so deeply entrenched in the way and context in which research is conducted that it is difficult to see how to change things.

Uncertainty

As a graduate researcher, I am in a position where my research could contribute to or avoid many of these problems. While readings such as these help me to be more conscious of mistakes I could make in designing studies and interpreting my data, much of my knowledge and biases have been formed on the methods that are under question. What other ways can I avoid being a part of these problems?

For readers who are interested in resources that address some of these problems:

Addressing the Replication Crisis: Open Science Collaboration

Addressing Publication Bias: Archives of Scientific Psychology

Skeptical Commentary on Issues in Science: Neuroskeptic

Tracking Retractions: Retraction Watch

References

Fiske, S. (In Press). Mob rule or wisdom in crowds?. Aps Observer

James, G., Witten, D., & Hastie, T. (2014). An Introduction to Statistical Learning: With Applications in R. (Ch 3)

Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584–585.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

--

--