Book Review: Computer Age Statistical Inference

Dan Saunders
Jun 22, 2018 · 4 min read

It’s time for another book review!


Computer Age Statistical Inference, by Bradley Efron and Trevor Hastie, is an effort to explain the development of statistics, in theory and practice, beginning at the end of the 19th century until today. It was published in 2016 by Cambridge University Press. Both Efron and Hastie are professors of statistics and biostatistics at Stanford University, and are extremely prolific writers on their subjects.


The authors make a strong distinction between algorithmic and inferential aspects of statistical analysis. The former refers to how data is processed; i.e., what procedures we apply to data to produce estimates of statistics in question. The latter is concerned with assessing the “goodness” of the aforementioned statistical procedures. For example, averaging is an example of a statistical algorithm for estimating the mean of data, while the standard error (square root of variance) is a typical way to assess its accuracy. This hints at an important theme throughout the book: “…the same data that supplies an estimate can also assess its accuracy.” However, the computation of the standard error is an algorithm itself, which is subject to inferential analysis concerning its accuracy!

The algorithmic aspect of analysis is unreliable without strong inferential justification. Averaging seems intuitively correct, but without the standard error, it would be difficult to know precisely how much data to collect in order to get an accurate estimate of the mean. Mathematics is required to understand the properties of estimators, such as efficiency, biasedness, or variance. For example, it is easy to show to show the sample variance is biased, so the naive algorithm for computing sample variance must be corrected for unbiasedness.

In recent years, there has been an unbalanced development of the two aspects in favor of algorithmic progress. There has been a proliferation of interesting datasets and computing power, making it easy to apply simple compute-intensive methods instead of the complicated and restrictive ideas from classical statistics.

For example, the bootstrap algorithm (and others like it) resample a dataset many times in order to get more precise estimates of a statistic. Resampling here means that many “fake” datasets are created by sampling with replacement from the original, “real” dataset. The statistic in consideration is estimated from each “fake” dataset, and the estimates are averaged together to provide a less variable estimate overall. The catch is, hundreds or thousands of bootstrap resampled datasets may be needed to create these accurate estimates; this is only recently possible thanks to the advancement of computing power.

There is good inferential justification for the bootstrap, but for many prediction-oriented methods in machine learning, this is lacking. For example, it’s widely accepted that we don’t understand why deep neural networks work, and a mature inferential theory of such methods doesn’t seem likely to materialize any time soon. The book discusses neural networks in Chapter 18 (without a single mention of inference), giving a high-level description of their construction, training, and relationship to simpler prediction methods.

Book structure

The book is split into three parts: “Classical Statistical Inference”, “Early Computer-Age Methods”, and “Twenty-First-Century Topics”.

In the first part (roughly 1900–1950), a distinction between frequentist, Bayesian, and Fisherian inference is made, and their properties are described and compared. There’s also material on parametric models, important across all approaches to statistical inference, culminating in a discussion of the general construction of exponential families.

In part two (roughly 1950–1995), statisticians were free to develop algorithms that could be implemented out by early computers (rather than by mechanical calculator or hand!). This led to the methods such as the jacknife, the bootstrap, ridge regression, cross-validation, and more, all potentially infeasible before computers were widely available.

In part three (roughly 1995 — present), inference is largely set aside as powerful prediction algorithms take center stage. The book makes an effort to showcase recent inferential efforts towards justifying these methods, but concedes that several aren’t yet well understood. The last two chapters are on the advanced topics of “Inference After Model Selection” (combining discrete model selection and continuous regression analysis) and “Empirical Bayes Estimation Strategies” (using indirect evidence in practice, and “ learning the equivalent of a Bayesian prior distribution from ongoing statistical observations”).


On a personal note, this book was tough. It’s recommended for graduate students in statistics, and I’m a student of computer science. However, I think it’s crucial to study in order to do machine learning research. Really, it’s an important subject for anyone who does quantitative work! Some general notes about the text:

  • The notation in the book is strange. This could simple be statisticians’ notation, but it took some getting used to.
  • Too much background knowledge is sometimes assumed. I frequently had to read other material to understand certain parts.
  • Some of the more advanced topics went completely over my head! I struggled especially with the final two chapters. This is a strong motivator to learn more.
Dan Saunders

Written by

MSc student in computer science at UMass Amherst. Likes machine learning and brain analogies.