Natural Language Processing (Part 29)-Cosine Similarity

Coursesteach
4 min readFeb 4, 2024

--

📚Chapter 3: Vector Space Model

Introduction

In this tutorial, you will learn how to compute the dot product between two vectors and how to compute the norm. Once you know how to do these two things, you’ll be able to compute the cosine similarity score. Now that you have the intuition behind the use of cosine of the angle between two vector representations as a similarity metric. I’ll go deeper into an explanation and then show you how to calculate the cosine similarity metric. In this section, you will get the cosine of the inner angle of two vectors. Then I’ll show you how the value of the cosine similarity is related to the similarity of the directions of two vectors.

Sections

Vector norm and Dot product.
Cosine Similarity
Summary

Section 1- Vector norm and Dot product.

First, you have to recall some definitions from algebra. The norm of a vector or its magnitude is written like this. It’s defined to be the square root of the sum of its elements squared. The dot product between two vectors is the sum of the products between their elements in each dimension of the vector space.

Section 2-Cosine Similarity

Let’s take another look at two of the corpora from the last section. Remember that’s in this example, you have a vector space where the representations of the corpora is given by the number of occurrences of the words disease and eggs.

The angle between those vector representations is denoted by Beta. The agriculture corpus is represented by the vector v, and the history corpus is going to be vector w. The dot products between those vectors is defined as follows. From this equation, you see that the cosine of the angle Beta is equal to the dot-product between the vectors divided by the product of the two norms. Replacing the actual values from the vector representations should give you this expression. In the numerator, you’ll have the product between the occurrences of the words disease and eggs, and then the denominator, you have the product between the norm of the vector representations of the agriculture and history corpora, ultimately, you should get a cosine similarity of 0.87.

But what does this metric tells you about the similarity of two different vectors?

Consider when two vectors are orthogonal in the vector spaces that you know so far, it is only possible to have positive values for any dimension. The maximum angle between pair of vectors is 90 degrees. In that case, the cosine would be equal to 0, and it would mean thatthe two vectors have orthogonal directions or that they are maximally dissimilar.

Now let’s look at the case where the vectors have the same direction. In this case, the angle between them is 0 degrees and the cosine is equal to 1 Because cosine of 0 is just 1. As you can see, as the cosine of the angle between two vectors approaches 1, the closer their directions are. Now you know how to get the cosine similarity between any pair of vectors.

Summary

An important takeaway is that

  • this metric is proportional to the similarity between the directions of the vectors that you are comparing and
  • that’s for the vector spaces you’ve seen so far, the cosine similarity takes values between 0 and 1.

To recap, you have learned to compute the cosine similarity score between two vectors. For the positive vectors you have seen so far, the score is bounded between 0 and 1. Note that if you were to take any cosine similarity score of a vector with itself, you will get 1. If the vectors are perpendicular, it’ll give you 0. Similar vectors have higher scores.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Classification and Vector Spaces

--

--