Understanding Spearman Correlation: Formula, Usage, and Implementation in Python

DataScience-ProF
2 min readMar 20, 2024

Correlation is a fundamental statistical concept used to measure the strength and direction of the relationship between two variables. While Pearson correlation coefficient is the most common measure, Spearman correlation coefficient offers an alternative approach, particularly suitable for variables that may not have a linear relationship or when data is not normally distributed. In this article, we delve into the Spearman correlation, its formula, usage, and implementation in Python.

What is Spearman Correlation?

Spearman correlation, named after Charles Spearman, is a non-parametric measure of statistical dependence between two variables. Unlike Pearson correlation, Spearman correlation does not assume that the variables are normally distributed or have a linear relationship. Instead, it evaluates the monotonic relationship between variables. In simpler terms, Spearman correlation assesses whether the variables tend to increase or decrease together, without necessarily following a straight line.

Formula for Spearman Correlation Coefficient:

The Spearman correlation coefficient, denoted by the symbol ρ (rho), is calculated using the following formula:

ρ = 1–6Σdᵢ² / [n(n² — 1)]

Where:

  • di​ is the difference between the ranks of corresponding observations of the two variables.
  • n is the number of observations.

Usage of Spearman Correlation:

Spearman correlation is widely used in various fields such as psychology, economics, biology, and sociology, where the relationship between variables may not be linear or when the assumption of normality is violated. Some common applications include:

  • Analyzing the relationship between ranks or ordinal data.
  • Comparing the performance of different ranking methods.
  • Assessing the correlation between subjective rankings or ratings.

Implementation in Python:

Python provides several libraries for calculating Spearman correlation, including NumPy, SciPy, and pandas. Let’s see how to compute Spearman correlation using pandas:

import pandas as pd

# Sample data
data = {'Variable1': [2, 3, 5, 7, 9],
'Variable2': [1, 4, 6, 8, 10]}

# Create a DataFrame
df = pd.DataFrame(data)

# Compute Spearman correlation
spearman_corr = df.corr(method='spearman').iloc[0, 1]

print("Spearman correlation coefficient:", spearman_corr)

In this example, we created a DataFrame with two variables, Variable1 and Variable2, and then computed the Spearman correlation coefficient using pandas corr() method with the argument method='spearman'.

Conclusion:

Spearman correlation provides a valuable tool for assessing the relationship between variables, particularly when the assumptions of normality and linearity are not met. Its simplicity and robustness make it suitable for a wide range of applications across various domains. By understanding the concept, formula, and implementation in Python, researchers and practitioners can effectively analyze and interpret relationships in their data without relying solely on linear correlations.

--

--

DataScience-ProF

Passionate about Data Science? I offer personalized data science training and mentorship. Join my course today to unlock your true potential in Data Science.