Where does the ‘regression’ in linear regression come from?

Brendan Gilroy
3 min readSep 3, 2019

--

The name of linear regression comes from Francis Galton’s 1885 study of parents’ and children’s heights.

Francis Galton 1822–1911

The method of least squares had existed long before Galton, in the early 1800s Legendre and Gauss had a priority dispute over the method’s discovery. Astronomy and Geodesy or the measurement of the Earth was a great source of mathematical discovery throughout history and the method of least squares has a similar origin. Scientists and mathematicians such as Daniel Bernoulli in the early 18th century and Carl Friedrich Gauss in the early 19th century would distinguish between error due to chance and systematic error [1]. The systematic error was what they were really interested in, that showed the grand forces at play with their object of study like gravitation or electro-magnetism. Analysis of the error due to chance was an early example of mathematicians working with the normal distribution, but it represented a nuisance factor that they sought to minimize.

“Study mathematics like a house on fire”

In 1836, Charles Darwin returned from his famous voyage on the Beagle. His young cousin, Francis Galton, was failing out of medical school due to his weak stomach. Galton wrote his cousin for advice and in his reply, Darwin told him to “Study mathematics like a house on fire” [2]. Galton received a large inheritance and spent it conducting studies of the weather, of criminal behavior, and most importantly inheritance of human traits. Galton was a bit of an odd duck and as part of his study of human variability he made a map of Great Britain based on women’s attractiveness — he would stand on street corners and make notes on the women he could see through his opera spectacles [3]. He also was the inventor of Eugenics and he held great hopes of breeding humanity into perfection through study of biometrics and government policy. But nevertheless, he enjoyed a great deal of influence, he was the director of the Kew Observatory from 1858 onward and he developed the practice of fingerprinting with Scotland Yard.

Galton had streams of people visiting him in his London office to have their measurements taken. He measured the head of the Prime Minister. He plotted parents’ heights against the heights of their children and hoped that taller parents would yield taller children, but found that the reverse was the case.

Image taken from Stephen Senn’s 2011 article in Significance Magazine [4]

He dubbed the slope of the best fit line the rate of ‘reversion to mediocrity’ or the rate of ‘regression to the mean’ and even though Eugenics has fallen out of favor, we still use the term regression to describe fitting lines to data points to this day.

[1] CF Gauss and the Theory of Errors, OB Sheynin. 1979

[2] The Rise of Statistical Thinking 1800–1900, Theodore M Porter. Princeton University Press. 1988

[3] Measure for Measure, Jim Holt. New Yorker. 2005.

[4] Francis Galton and Regression to the Mean, Stephen Senn. Significance Magazine. Royal Statistical Society. 2011. https://rss.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1740-9713.2011.00509.x

--

--