Statistics Done Wrong: The Woefully Complete Guide

by Alex Reinhart

Paperback: 176 pages

Publisher: No Starch Press; 1 edition (March 16, 2015)

Language: English

ISBN-10: 1593276206

ISBN-13: 978-1593276201

Product Dimensions: 8.1 x 5.9 x 0.5 inches

My Rating: 3/5

**Introduction**

Statistics Done Wrong: The Woefully Complete Guide by Alex Reinhart, a graduate student in statistics at Carnegie Mellon University, is a guide to many common errors in statistical analyses in scientific research papers with many examples drawn mostly from the biology and medical research literature. There is also a Statistics Done Wrong web site. The book is primarily for graduate students, research scientists, and other professional data analysts with some background in probability and statistics, generally equivalent to taking a good one year first course in probability and statistics in college.

Although *Statistics Done Wrong* is not highly technical with few formulas or calculations, medical patients, policy makers, and others who need to make sense of the many statistical analyses now used to market drugs, medical procedures, policies, and many other goods and services will likely find the book slow going unless they already have some background in probability and statistics whether through formal training or personal study. The book is weak in defining many technical terms such as ANOVA (analysis of variance) that are introduced at various points and I found myself looking up either definitions or more precise definitions on Wikipedia (not an ideal source) or in my collection of books and articles on probability and statistics.

*Statistics Done Wrong* paints a rather dismal picture of the quality of statistics in the scientific literature, especially in the fields of biology, medicine, and psychology, somewhat in the spirit of John Ioannidis’s claims. Ioannidis’s works such as “Why Most Published Research Findings are False” are cited a number of times.

Overall I would recommend the book but with some important reservations. In my previous reviews of Joel Best’s books on the misuse of statistics, I concluded that Best gave little guidance for readers seeking to evaluate complex statistical claims as opposed to simple but misleading or false numbers such as “one million missing children” or “three million homeless people” frequently encountered in mass media coverage of social problems. These complex statistical claims assert an effect such as global warming that is comparable to or smaller than the normal variation in the measured quantity and derived from averaging over a large number of highly variable measurements and often fitting a mathematical model to the data or applying abstruse, advanced statistical methods. These statistical claims may be real but can also easily be produced by conscious, unconscious, or accidental biased sampling of the highly variable data or other subtle errors or manipulations. These statistical claims are also difficult or impossible to confirm or deny based on personal experience due to the high variability of the measured quantity compared to the size of the alleged effect. *Statistics Done Wrong* directly addresses these more complex and difficult cases.

These complex statistical claims include many contentious and emotional issues such as the effectiveness and safety of vaccines, global warming (climate change), the effectiveness of chemotherapy and other cancer treatments (for example the Whipple surgical procedure for pancreatic cancer), and laboratory parapsychology. It is common for advocates of these claims to discuss them as if they were not statistical in nature, but rather “hard” facts such as my near absolute certainty that I cannot walk through the walls of my apartment or that if I hold out a rock in my hand and let go, it will with great certainty fall to the ground. Skeptics are increasingly labeled as Statistical Claim Deniers or Statistical Claim Denialists, in analogy to Holocaust Deniers or Denialists, an *ad hominem* tactic that has little to do with rational analysis and is highly questionable at best.

On the other hand, skeptics always seem to be able to find substantive issues such as those discussed in *Statistics Done Wrong* and Joel Best’s books that call into deep question any purely statistical claim. What has been described as an “infinite regress” occurs in which if a particular criticism is conclusively shown to be false (in itself a very difficult achievement) skeptics will simply find yet another potential problem with the statistical analysis. Skeptics it seems are rarely if ever able to replicate purely statistical claims and advocates almost always can. *Statistics Done Wrong* is unlikely to fully fix this mushy quality of statistics in the real world.

Fraud is often implied and an actual conflict of interest (the research in question was funded by Colossal Pharmaceuticals — see these twelve billion dollar settlements with the Department of Justice for inaccurate marketing of failed wonder drugs X,Y, Z etc. that nonetheless admit no fault) or a potential conflict of interest (alternative medical “experts” always seem to have a book or video that you can buy and might become a bestseller even if it is not a bestseller right now) can usually be asserted to support suggestions of fraud or unconscious bias.

**The Much Maligned P-Value**

The book devotes a chapter and many sections to the many problems with the p-value, one of the most commonly cited statistics in scientific research. Loosely, the *p-value* is the probability that the results of an experiment could have been produced by pure chance. Scientists often say that the results of an experiment are statistically significant if the *p-value* is less than or equal to 0.05 (five percent), a value chosen more or less arbitrarily by pioneering statistician Ronald Fisher. This seemingly straightforward concept hides a plethora of difficulties that have become increasingly well-known in recent years, leading some scientific journals to ban the *p-value* altogether. This may be a case of throwing the baby out with the bath water.

In some respects, the empirically mushy quality of statistics in the real world is blamed on the limitations of the *p-value* in *Statistics Done Wrong*. The book argues for the use of confidence intervals on the putative effect size as a solution. I agree with the author that quoting confidence intervals on effect sizes in addition to the *p-value* is an improvement in statistical practices in research, but confidence intervals in no way solve the “infinite regress” problem. Indeed, all a skeptic need do is ask whether the confidence interval is too small or the estimated effect systematically biased and in fact the alleged effect is consistent with no effect. With statistical claims where the alleged effect is comparable to or smaller than the typical variations in the measured quantity, there are many ways biased measurement, biased sampling, or other subtle issues can produce a small effect and an incorrect confidence interval.

**Other Common Problems with Statistics in Scientific Research**

*Statistics Done Wrong* has chapters and sections on a number of other common problems in statistics in scientific research, several of which overlap with the weaknesses of the *p-value*. One chapter covers *statistical power*, loosely the probability that a statistical test/experiment will correctly reject that the results of the experiment are due to pure chance — the *null hypothesis* in statistical terminology. The statistical power of an experiment increases toward 1.0 with the sample size — the number of independent measurements in the experiment. *Statistics Done Wrong* argues that many scientific papers fail to compute the statistical power and have low statistical power — have too few measurements to reach reliable conclusions. In most cases, the book is talking about the statistical power of a *p-value* test such as the standard *p* < 0.05 test. The book covers several other common problems including "pseudo-replication," "the base rate fallacy," and "torturing the data until it confesses."

**Conclusion**

In conclusion, I recommend *Statistics Done Wrong* for those seeking to evaluate complex statistical claims as well as researchers trying to improve their research, which seems to be the target audience of the book. If the reader does not have some background in probability and statistics already, he or she will probably need to get up to speed by studying introductory probability and statistics at the college level. Even if the reader has a background in probability and statistics, the reader will likely need to look up some terms and jargon to understand some sections in the book.

*Statistics Done Wrong* is unlikely to fully fix the empirically mushy quality of the purely statistical claims in the real world. Even if researchers follow the suggestions in the book, the “infinite regress” problem is likely to continue for contentious statistical claims. Historically, purely statistical claims have mostly graduated to “hard” facts when it has become possible to isolate the causes and effects and demonstrate a strong unequivocal effect on demand. We don’t have heated emotional debates about whether we can walk through solid walls because the effect (“OW! THAT HURT!”) is strong, unequivocal, not statistical, and easily reproduced by most people. Statistics can mostly show us the way to find new “hard” facts but it cannot provide the “hard” facts. An experiment or machine that isolates the causes and effects and demonstrates a strong reproducible effect with negligible statistical variation is needed. Rarely, if ever, is a statistical “fact” (scare quotes on fact intentional) a “hard” fact.

© 2015 John F. McGowan

**About the Author**

*John F. McGowan, Ph.D.* solves problems using mathematics and mathematical software, including developing gesture recognition for touch devices, video compression and speech recognition technologies. He has extensive experience developing software in C, C++, MATLAB, Python, Visual Basic and many other programming languages. He has been a Visiting Scholar at HP Labs developing computer vision algorithms and software for mobile devices. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at [email protected].

Wow! Your research is clearly extensive in this field. I was thoroughly impressed with this article and others on your website. Do I have permission to provide a link to this webpage from my site?

If we (formerly) needed “a Good Housekeeping seal” for consumer packaged goods, a Consumer Reports for general consumer goods, and an Angie’s List for consumer services, I think we also need an independent, objective review panel for those who want a “seal of approval” on the design of their experiments and the wording of their conclusions. I am exhausted to the point of oblivion on trying to bird-dog nutrition claims. I understand statistics, but nearly every claim is refuted by other credible sources. Is there such a panel that vets designs and conclusions, other than the peer-review journals?

The Cochrane Collaboration http://www.cochrane.org is an attempt to do this for medical research. I have not been able to evaluate how well it achieves its goals — a good subject for a Math Blog article perhaps.

John

Thank sir for giving such a great useful information

Thanks for the article. Excellent written!