Distances in Machine Learning

Namratesh Shrivastav

Published in

Analytics Vidhya

4 min readJan 5, 2020

There are many methods to calculate distances in machine learning. Here we are going to discuss some of them.

Euclidean distance
Mahanta distance
Minkowski distance
Hamming distance
Cosine distance & Cosine Similarity

Euclidean Distance

It is the distance between x and y in n dimension. Here, we are calculating distance d between to data points p1 and p2.

code:

from sklearn.metrics.pairwise import euclidean_distances
X = [[0, 1], [1, 1]]#distance between rows of X
euclidean_distances(X, X)#get distance to origin
euclidean_distances(X, [[0, 0]])output:
array([[1.        ],
       [1.41421356]])

Mahanta distance

It is the sum of absolute differences of all coordinates. Suppose we have to tell someone to distance between A to B. So, here we will say go 3 blocks straight and 3 more to left then distance will be 6 blocks.

One thing that is needed to be mention that we can’t go diagonally here.

Equation

code:

import math
p1 = [4, 0]
p2 = [6, 6]
distance = math.sqrt( ((p1[0]-p2[0])**2)+((p1[1]-p2[1])**2) )print(distance)output:
6.324555320336759

Minkowski distance

It is a distance to measure the similarity between point A to B in normed vectors space. There are 2 terms vector space, normed vector space let’s get brief in it.

vector space- It is a collection of vectors that can be added together and multiplied by numbers like a scalar.
Normed vector space- It is a vector space over the real or complex numbers on which a norm is defined (in a space where distances can be represented as a vector that has a length).

if see formula there are two things

if p =1 it becomes Mahanta distance
if p = 2 it becomes Ecludiean distance

X1 = [0,1,1,0,1,0,1,1,1]

X2 = [1,1,1,0,1,0,0,1,0]

code:

from scipy.spatial import distance
distance.minkowski([0,1,1,0,1,0,1,1,1], [1,1,1,0,1,0,0,1,0], 1)output:
3

Hamming Distance

It is used to measure distance in texts. Here we are taking a boolean vector to learn more about to hamming distance. Let’s say we have X1, X2 two boolean vectors.

Hamming distance(X1, X2) = no. of locations where binary values differ

code:

from scipy.spatial import distancedistance.hamming(['a','b','c','d'], ['d', 'b','c', 'd'])code:
0.25

Cosine distance & Cosine Similarity

Cosine Similarity is to measure similarity in two or more documents irrespective of their size. It used Cosine distance to calculate similarity.

The cosine similarity is defined as

and

Cosine Distance=1− Cosine Similarity

Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space.

So, which value is useful to define what?

cos(0) = 1 , cos(360) = 1 ( there is similarity)

cos(90) = 0, cos(270) = 0( there are only few similarity :negligible)

cos(180) = -1 (not at all similarity)

code:

from scipy.spatial import distance
distance.cosine([1, 0, 0], [0, 1, 0])output:
1.0

notebook attached here.

Thanks for reading, suggestions are welcome!!!

References:

Cosine Similarity - Understanding the math and how it works? (with python)

Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically…

www.machinelearningplus.com

Minkowski distance explained

Sometimes we want to measure how much things are similar to each other or how different they are. It happens not only…

www.mikulskibartosz.name

How to measure distances in machine learning

It all depends on the point of view

towardsdatascience.com

https://towardsdatascience.com/importance-of-distance-metrics-in-machine-learning-modelling-e51395ffe60d

Distances in Machine Learning

Euclidean Distance

Mahanta distance

Minkowski distance

Hamming Distance

Cosine distance & Cosine Similarity

References:

Cosine Similarity - Understanding the math and how it works? (with python)

Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically…

Minkowski distance explained

Sometimes we want to measure how much things are similar to each other or how different they are. It happens not only…

How to measure distances in machine learning

It all depends on the point of view

Written by Namratesh Shrivastav