Anscombe’s Quartet — An Importance of Data Visualization
Usually people believe “the numerical calculations are exact, but graphs are rough” even though it’s completely wrong. Even I was not right about it before learning data analytics.
If you are new in the data science or its sub fields, believe me this is the first step towards the understanding of the importance of Data Visualization along with the statistics result.

Anscombe’s Quartet is the modal example to demonstrate the importance of data visualization which was developed by the statistician Francis Anscombe in 1973 to signify both the importance of plotting data before analyzing it with statistical properties. It comprises of four data-set and each data-set consists of eleven (x,y) points. The basic thing to analyze about these data-sets is that they all share the same descriptive statistics(mean, variance, standard deviation etc) but different graphical representation. Each graph plot shows the different behavior irrespective of statistical analysis.
Apply the statistical formula on the above data-set,
Average Value of x = 9
Average Value of y = 7.50
Variance of x = 11
Variance of y =4.12
Correlation Coefficient = 0.816
Linear Regression Equation : y = 0.5 x + 3
However, the statistical analysis of these four data-sets are pretty much similar. But when we plot these four data-sets across the x & y coordinate plane, we get the following results & each pictorial view represent the different behavior.
- Data-set I — consists of a set of (x,y) points that represent a linear relationship with some variance.
- Data-set II — shows a curve shape but doesn’t show a linear relationship (might be quadratic?).
- Data-set III — looks like a tight linear relationship between x and y, except for one large outlier.
- Data-set IV — looks like the value of x remains constant, except for one outlier as well.
Python code on GitHub !

Data-sets which are identical over a number of statistical properties, yet produce dissimilar graphs, are frequently used to illustrate the importance of graphical representations when exploring data. This isn’t to say that summary statistics are useless. They’re just misleading on their own. It’s important to use these as just one tool in a larger data analysis process. Visualizing our data allows us to revisit our summary statistics and re-contextualize them as needed.
“Visualization gives you answers to questions you didn’t know you had.” — Ben Schneiderman
Reference Research Paper : https://www.autodeskresearch.com/publications/samestats
