
As an aspiring data scientist one of the most important tools for data analysis in my tool kit is regression. Although I have a strong understanding of the subject now, it took several attempts to get to where I am now and there is still more that I can learn about the subject. For this blog I wanted to look into how this important method was discovered.
The discovery of statistical regression was actually extremely contentious. Adrien-Marie Legendre was the first to officially publish on the subject, but nowadays most of the credit for the discovery is given to the legendary mathematician Carl Friedrich Gauss. Gauss did in fact develop the method years earlier than Legendre, but he considered his findings “trivial” and assumed that someone else had already published on it. It was only after Legendre published on the method that he decided to seek credit.
Regression was discovered at the turn of the 18th century, when sea travel was of utmost economic importance. However ocean navigation was still riddled with dangers and inaccuracies. Improving the accuracy and precision of navigation would lead to increased profits for the monarchs and noblemen shipping goods across the world. In order to understand the shape of the Earth, scientists tracked the movements of other planets and comets relative to the Earth. This was the context that Legendre and Gauss were working under.
The first appearance of the regression method appeared in 1805 in Legendre’s paper titled “New Methods for Determination of the Orbits of Comets”. In this paper Legendre went over how he was able to predict the orbit of comets and writes “Of all the principles which can be proposed for [making estimates from a sample], I think there is none more general, more exact, and more easy of application, than that of which we have made use … which consists of rendering the sum of squares of the errors a minimum.”
Four years later Gauss releases his treatise “Theory of the Motion of the Heavenly Bodies Moving About the Sun in Conic Sections” in which he was able to predict the time and location where the asteroid Ceres would appear in the sky. No other scientist was able to make this calculation. Gauss was able to accomplish this feat through his use of complex geometry, which included the method of least squares. Gauss also goes as far as to write “Our principle, which we have made use of since 1795, has lately been published by Legendre … where several other properties of this principle have been explained.”
Legendre was obviously angered after he had read what Gauss wrote about him and decided to write a letter to Gauss. In it Legendre writes “There is no discovery that one cannot claim for oneself by saying that one had found the same thing some years previously; but if one does not supply the evidence by citing the place where one has published it, this assertion becomes pointless and serves only to do a disservice to the true author of the discovery.” He also writes “You have treasure enough of your own, Sir, to have no need to envy anyone;”
Gauss was regarded as a mathematical genius, and though Legendre was certainly no intellectual lightweight, he simply could not stack up to the man known as the “Prince of Mathematicians”. There is much evidence to suggest that Gauss did in fact discover the method of least squares first. His colleagues recall and corroborate that Gauss had explained the concept of least squares to them; also certain calculations in his notebooks could not have been done by any other method. Furthermore, Gauss’s publication on the subject of least squares was much more detailed than Legendre’s, going “far beyond Legendre in both conceptual and technical development, linking the method to probability and providing algorithms for the computation of estimates.”
Perhaps the funniest part of this dispute is Gauss’s conflicting feelings about his discovery. On the one hand he was absolutely adamant in his belief that he should be given credit for discovering least squares, but on the other hand he seemed apathetic about the discovery, describing it as “not the greatest of my discoveries.” Gauss even looked down on his predecessors for not making the discovery sooner. In a letter he wrote to a colleague he talked about how embarrassed he was for previous mathematicians and about how he did not want to publicize his discovery so as not to “urinate on the ashes of my ancestors”.
Although Gauss and Legendre laid the foundation for statistical regression through their discovery of least squares, the term “regression” was not coined until 1886, by Francis Galton while he was studying the correlation between trees to their parents. Galton discovered that when parent seeds were large, their children seeds were small and if the parent seeds were small, the children seeds were large. Karl Pearson, a colleague of Galton’s plotted the size of parent seeds on the x-axis and their children on the y-axis and used least squares to draw a line of best fit, which he dubbed the “regression line”. R.A. Fisher further expanded upon the properties of least squares estimation and is responsible for regression analysis as we know it today. Because of Fisher, not only can regression be used for prediction, but can also be used to make inferences about the relationship between a factor and an outcome.
