Euclidean and Manhattan distance metrics in Machine Learning.

Gaurav Rajesh Sahani
Analytics Vidhya
Published in
4 min readJul 24, 2020

Many of the Supervised and Unsupervised machine learning models such as K-Nearest Neighbor and K-Means depend upon the distance between two data points to predict the output. Therefore, the metric we use to compute these distances plays an important role in these particular models.

Distance metric uses distance function which provides a relationship metric between each elements in the dataset.

A good distance metric helps in improving the performance of Classification, Clustering, and Information Retrieval process significantly. In this article, we will discuss different Distance Metrics and how do they help in Machine Learning Modelling.

So, in this blog, we are going to understand distance metrics, such as Euclidean and Manhattan Distance used in machine learning models, in-depth.

Euclidean Distance Metric:

Euclidean Distance represents the shortest distance between two points.

The “Euclidean Distance” between two objects is the distance you would expect in “flat” or “Euclidean” space; it’s named after Euclid, who worked out the rules of geometry on a flat surface.

The Euclidean is often the “default” distance used in e.g., K-nearest neighbors (classification) or K-means (clustering) to find the “k closest points” of a particular sample point. The “closeness” is defined by the difference (“distance”) along the scale of each variable, which is converted to a similarity measure. This distance is defined as the Euclidian distance.

It is only one of the many available options to measure the distance between two vectors/data objects. However, many classification algorithms, as mentioned above, use it to either train the classifier or decide the class membership of a test observation and clustering algorithms (for e.g. K-means, K-medoids, etc) use it to assign membership to data objects among different clusters.

Mathematically, it’s calculated using Pythagoras’ theorem. The square of the total distance between two objects is the sum of the squares of the distances along each perpendicular co-ordinate.

Manhattan Distance Metric:

Manhattan Distance is the sum of absolute differences between points across all the dimensions.

Manhattan distance is a metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates. In a simple way of saying it is the total sum of the difference between the x-coordinates and y-coordinates.

This Manhattan distance metric is also known as Manhattan length, rectilinear distance, L1 distance or L1 norm, city block distance, Minkowski’s L1 distance, taxi-cab metric, or city block distance.

Applications of Manhattan distance metric include,

  1. Regression analysis: It is used in the linear regression to find a straight line that fits a given set of points
  2. Compressed sensing: In solving an underdetermined system of linear equations, the regularisation term for the parameter vector is expressed in terms of Manhattan distance. This approach appears in the signal recovery framework called compressed sensing
  3. Frequency distribution: It is used to assess the differences in discrete frequency distributions.

Now,

We’ll calculate the Euclidean and Manhattan distance, from the example given below, which would give an intuition about both.

Considering the figure given below,

For both distance metrics calculations, our aim would be to calculate the distance between A and B,

Let’s look into the Euclidean Approach to calculate the distance AB.

Figure 1: Euclidean Approach

Now, Considering the Manhattan approach for the same,

Figure 2: Manhattan Approach

The Approach we saw, was the mathematical approach to find Euclidean and Manhattan distances.

Let’s jump into the practical approach about how can we implement both of them in form of python code, in Machine Learning, using the famous Sklearn library.

Now, apart from these distance metrics, we also have other popular distance metrics, which are,

  1. Hamming Distance: Used to Calculate the distance between binary vectors.
  2. Minkowski Distance: Generalization of Euclidean and Manhattan distance.
  3. Cosine distance: Cosine similarity measures the similarity between two vectors of an inner product space.

This was all from my side, If you really liked the Blog, please do give a “Clap”, which motivates me to come up with new blogs, as a part of my contribution to the Data-Science community.

--

--

Gaurav Rajesh Sahani
Analytics Vidhya

Machine Learning, Deep Learning, and Cloud Enthusiast. Like to explore new things, enhancing and expanding my knowledge each and every day!