Relationship between Cosine Similarity and Euclidean Distance.

Tanveer Khan
AI For Real
Published in
2 min readMar 10, 2020

--

Many of us are unaware of a relationship between Cosine Similarity and Euclidean Distance. Knowing this relationship is extremely helpful if we need to use them interchangeably in an indirect manner. One application of this concept is converting your Kmean Clustering Algorithm to Spherical KMeans Clustering algorithm where we can use cosine similarity as a measure to cluster data.

Use Case:-

We often want to cluster text documents to discover certain patterns. K-Means clustering is a natural first choice for clustering use case. K-Means implementation of scikit learn uses “Euclidean Distance” to cluster similar data points.

It is also well known that Cosine Similarity gives you a better measure of similarity than euclidean distance when we are dealing with the text data.

So We may want to run Kmeans using cosine distance which is not possible in the case of scikit learn implementation.

We can use hack — if some how convert euclidean distance as some proportionate measure of cosine distance then this can be achieved.

Mathematics

Proof with Code

import numpy as np
import logging
import scipy.spatial
from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse
from sklearn import metrics
from sklearn import preprocessing
from sklearn.metrics.pairwise import euclidean_distances
np.random.seed(42)test_array = np.random.rand(3,100)for item in range(test_array.shape[0]):
element = test_array[item]
print (element.transpose().dot(element)

output

30.868488161326475
33.289148886695116
35.31491104309238

Normalizing Vectors

X_normalized = preprocessing.normalize(test_array, norm='l2')
euclidean_dist = euclidean_distances(X_normalized)
squared_euclidean = np.square(euclidean_dist)
print (squared_euclidean)

output

[[0.         0.55794124 0.54552104]
[0.55794124 0. 0.56962493]
[0.54552104 0.56962493 0. ]]

Computing cosine similarity

adjusted_cosine_distance = 2 - 2*cosine_similarity(X_normalized)
print (adjusted_cosine_distance)

output

[[6.66133815e-16 5.57941240e-01 5.45521039e-01]
[5.57941240e-01 2.22044605e-16 5.69624926e-01]
[5.45521039e-01 5.69624926e-01 6.66133815e-16]]

We can see from above that when vectors u and v are normalised then there exist a relationship between cosine similarity and euclidean distance.

For Normalised Vectors:

Euclidean Distance (u,v) = 2 * (1- Cosine Similarity(u,v))

Euclidean Distance (u,v) = 2 * Cosine Distance(u,v)

Hack :- So in the algorithms which only accepts euclidean distance as a parameter and you want to use cosine distance as measure of distance, Then you can convert input vectors into normalised vector and you will get results as per the cosine distance.

Hope above explanation has cleared your understanding about relationship between euclidean distance and cosine similarity.

--

--

Tanveer Khan
AI For Real

Sr. Data Scientist with strong hands-on experience in building Real World Artificial Intelligence Based Solutions using NLP, Computer I Vision and Edge Devices.