Natural Language Processing (Part 27)-Euclidean Distance

Coursesteach
5 min readJan 21, 2024

--

📚Chapter 3: Vector Space Model

Introduction

In some Machine Learning Algorithms, we have to measure the distance between data points. For example, in Supervised Machine Learning algorithms like K-Nearest Neighbours, we have to select the nearest data points using any kind of Distance Metric. And in Unsupervised Machine Learning algorithms like K-Means Clustering, and Hierarchical Clustering, we have to use any type of Distance Metric to determine the similarities and differences between the data points [1] But, depending on the location of the data in a dataset, we can get a rough idea of the nearest and farthest data points.

There are several methods we can use for that. Among them, we will learn about some of the most popular and widely used types of Distance Matrices.

  • Euclidean distance
  • Manhattan distance
  • Cosine distance

In this tutorial, you’re going to learn about Euclidean distance, which is a similarity metric. This metric allows you to identify how far two points or two vectors are apart from each other. During this segment, you will get the Euclidean distance between two document vectors like the ones from the previous Tutorial and then generalize that notion to vector spaces in higher dimensions.

Sections

Euclidean distance
Euclidean Distance for n-dimensional Vectors
Implementation of the Euclidean distance in Python

Section 1- Euclidean distance

Euclidean distance basically calculated based on the Pythagoras theorem. To find the distance between two points on a plane using Euclidean distance, the length of the straight line connecting the two points is measured.

Let’s use two of the corporate vectors you saw previously. Remember in that example, there were two dimensions. The number of times that the word data and the word film appeared in the corpus. Corpus A will be the entertainment corpus and Corpus B will be the machine-learning corpus.

Now let’s represent those vectors as points in the vector space. The Euclidean distance is the length of the straight line segments connecting them. To get that value, you should use the following formula. The first term is their horizontal distance squared, and the second term is their vertical distance squared. As you see, this formula is an example of the Pythagorean theorem. If you solve for each of the terms in the equation, you should arrive at this expression, and at last get a Euclidean distance approximately equal to 10,667.

Section 2- Euclidean Distance for n-dimensional Vectors

When you have higher dimensions, the Euclidean distance is not much more difficult. Let’s walk through an example using the following co-occurrence matrix. Suppose that you want to know the Euclidean distance between the vector v of the word ice cream and the vector representation w of the word the boba. To start, you need to get the difference between each of their dimensions. Square those differences, sum them up, and then get the square roots of your results. This process is the generalization of the one from the last slide. This is the formula that you would use to get the Euclidean distance between vector representations on an n-dimensional vector space. If you remember from algebra, this formula is known as the norm of the difference between the vectors that you are comparing.

Section 3- Implementation of the Euclidean distance in Python

Let’s take a look at the implementation of the Euclidean distance in Python. If you have two vector representations like the ones from the previous example, you can use the lean alg module from NumPy to get the norm of the difference between them. If you implement this code in Python, you should get these results. The norm function works for n-dimensional spaces.

There are several ways to measure Euclidean Distance using Python [1]

USING THE SCIPY LIBRARY

from scipy.spatial import distance
A = (5, 3)
B = (2, 4)
d = distance.euclidean(A, B)
print('Euclidean Distance:',d)
OUTPUT:
Euclidean Distance: 3.1622776601683795

Using NumPy Library

import numpy as np
A = np.array((5, 3))
B = np.array((2, 4))
d = np.linalg.norm(A-B)
print("Euclidean Distance: ",d)
OUTPUT:
Euclidean Distance: 3.1622776601683795

Summary

The primary takeaways here are that the Euclidean distance is basically the length of the straight line that connects two vectors and that together Euclidean distance, you have to calculate the norm of the difference between the vectors that you are comparing. By using this metric, you can get a sense of how similar two documents or words are. Now that you have learned Euclidean distance, in the next video, I’ll show you a different type of similarity function. Concretely, I’ll show you the cosine similarity function, which is one of the most popular similarity functions.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Classification and Vector Spaces

--

--