Spearman’s Correlation

Published in

Analytics Vidhya

3 min readMar 7, 2021

Spearman’s Correlation is the feature selection method.
Spearman’s Correlation determines the strength and direction of the monotonic relationship between your two variables.

What is a Monotonic Relationship?

when the value of one variable increases the values of another variable is also increases or vice versa but not in a linear manner.
look at the below image for more understanding.

Mathematics Behind Spearman’s Correlation

Example:

english = np.array([67,89,88,90,95])
maths = np.array([77,86,98,95,87])

d = {'english':english, 'maths':maths}

data = pd.DataFrame(d)
data

Now, we have to assign a rank to each variable on the basis of their increasing order.
So, we have created a rank for each column.

english_rank = np.array([1,3,2,4,5])
maths_rank = np.array([1,2,5,4,3])

data['english_rank'] = english_rank
data['maths_rank'] = maths_rank
data

sees the above table to understand how we have ranked the variables.
the dataset we have considered does not have duplicated values so here we can use the formula 1.

Formula:

where ‘di’ is the difference between ranks and ‘n’ is the total number of observations.

2. If there are duplicates in the dataset then we use the following formula:

Calculating the Spearman's correlation:

data['d'] = data['english_rank'] -data['maths_rank']
data['d2'] = data['d']**2

data

Here, we are calculating spearman’s correlation using the first formula.

sc = 1 - (6*data['d2'].sum() / ( len(data.index) * ( len(data.index)**2  -1)) )

# sc gives the score of relationship between ranks of two individual features.
scoutput :0.30000000000000004

Implementation using Scipy library:

we can use spearman's correlation from the Scipy module.
we have imported the spearmanr from scipy. stats module and also imported the SelectKBest class.

# SelectKBest is used to select k best features.

from sklearn.feature_selection import SelectKBest
from scipy.stats import spearmanr

SelectKBest used to select k best features on the basis of classifier score. (here our classifier is spearmanr)

skb = SelectKBest(score_func=spearmanr, k=1)

X = data[['english']]
y = data['maths']

skb.fit(X, y)output:SelectKBest(k=1, score_func=<function spearmanr at 0x7f3d563c15f0>)

skb.scores_output: array(0.3)

we can see that the Score we have calculated earlier using the formula is almost the same as the score we calculated using the spearmanr method.
Here is my complete notebook on Spearman’s Correlation. Click here

Summary:

Written by Swapnilbobe