Unveiling Insights with Scatterplots and Correlation

Prasan N H
3 min readOct 30, 2023

--

In the realm of data analysis, understanding the relationships between variables is often a key objective. Two valuable tools in this quest are scatterplots and correlation measures. In this blog post, we’ll explore how these tools help us uncover insights from data.

Scatterplot: Visualizing Relationships

A scatterplot is a visual representation of bivariate numerical data, where two variables are plotted on the X-Y coordinates. Here’s what it can reveal:

  • Clusters: You can spot groupings or clusters in the data, indicating that certain values tend to occur together.
  • Outliers: Scatterplots make outliers stand out. Outliers are data points that deviate significantly from the norm, and they can hold valuable information or errors.
  • Correlations: The overall pattern of the scatterplot can give you insights into the relationship between the two variables.
  • Types of Correlation: You can identify the type of correlation: positive (rising), negative (falling), or null (uncorrelated).
You can identify the type of correlation: positive (rising), negative (falling), or null (uncorrelated).
Various types of correlation visualized

Correlation: Measuring Relationships

Correlation, in statistical terms, is a numerical measure of the strength and direction of the relationship between two variables. It’s like quantifying the connection between them. Here’s what you need to know:

  • Range of Values: Correlation values range from -1 to +1. A correlation of +1 means the variables have a perfect positive relationship, while -1 indicates a perfect negative relationship. A correlation of 0 suggests no relationship.
  • Pearson’s Correlation Coefficient: Pearson’s ‘r’ measures the strength and direction of the linear relationship between two variables. It’s normalized, always falling within the range of [-1, 1].
  • Spearman’s Rank Correlation Coefficient: This measure is more sensitive to non-linear relationships between variables.
Correlations: a perfect positive relationship, a perfect negative relationship and no relationship.
Correlation values range from -1 to +1

Covariance: The Precursor to Correlation

Covariance, a measure of how two variables change together, serves as a precursor to correlation. It reveals the joint variability and direction of the linear relationship between variables. Key points to remember:

  • Positive and Negative Covariance: Positive covariance indicates that the variables tend to change in the same direction, while negative covariance means they change in opposite directions.
  • Covariance and Independence: If the covariance between two variables is zero, it suggests independence, meaning they are not linearly related.
Positive covariance, Negative covariance and no relationship
Covariance between two variables visualized

Scatterplots and correlation measures are powerful tools for delving into the relationships within your data. They provide a visual and numerical lens through which you can make informed decisions, identify patterns, and gain valuable insights. These tools are essential for understanding the story your data wants to tell.

--

--

Prasan N H

Currently pursuing MS in Information Science from University of Arizona (2023-2025)