Age of Miss America correlates with Murders by stream?

Saliha Akca-Hobbins
Human Systems Data
Published in
3 min readMar 15, 2017

Sounds weird, right? But statistically speaking, they are correlated :). Here is the proof of the data.

This image is taken from Tyler Vigen’s website.

I don’t know why, this week’s reading assignments reminds me Tyler Vigen’s website and his awesome ridiculous graphs. (One of them is used to create the title, which is shared above). Vigen uses a very smart way to criticize how statistical correlations or significance can be achieved by using non-sense variables and statistical manipulation. After using maybe 1000 of unrelated measurements, it is possible to create statistical significant between two ridiculously unrelated variables. Maybe this is an extreme example to start with the discussion. But some of the social science misinterpretations can create similar effects, if a correct statistical methodology is not used in the research.

Misuse of statistical values has been discussed for many years. Especially when the replication studies didn’t match with the original research. Last couple years, replication crisis has been one of the hot topics of social scientists. Gelman (2016) showed the historical evolution of the problem with a detailed timeline. In the article, he highlighted some the controversies between the social scientists. He showed how approach to replication crisis can be different from tenured professors to individual researchers, who has no intention to publish in a peer review journal. His 2017 interview article/blog post was also related with the replication crisis. At this time,he focused on the replication issues and noise problems of human sciences studies. He defined the noise problems as a random error, which can interfere with objective research results. In the interview, the suggested solutions are focused on rethinking the design of the experiment and control of variation.

The last article of the week was an essay, which is aimed to provide a resource for researchers to avoid misunderstandings of statistics. Greenland, Senn, Rothman and Carlin (2016) provided 25 misinterpretations or misuse of P values and their relatives such as power, and confidence interval. In the article, they tried to clarify these misunderstandings with short explanations. They emphasized that these misunderstandings can be eliminated with statistical attention and patience of scientists.

Self-critic: I have to admit, as a student , especially in grad level physics labs, I felt the pressure of getting the best data. Sometimes I was spending months in the lab to obtain the smoothest dataset to present. The academic and emotional pressure in the group meetings and stress of regular student progress report with the mentor were giving the idea of using only the best part of the data. Maybe it was a cherry-pick data analysis, and it was wrong. At that time, it made sense to to be a bit picky. What I want to say is, yes there is a problem about the statistical methods. But there are also other issues in the system, which push the researchers to be more cherry-picker. I was only a grad student, I felt it strongly. I am sure this pressure is at the maximum level if someone tries to be tenured.

References:

Gelman A., 2016. What has happened down here is the winds have changed. Retrieved from http://andrewgelman.com/.

Gelman A, 2017. Measurement error and the replication crisis. Retrieved from http://andrewgelman.com/.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P. European journal of epidemiology, 31(4), 337–350.

Vigen, T. Spurious Correlations. Retrieved from http://www.tylervigen.com/.

--

--