Natural Language Processing (Part 38)-Approximate nearest neighbors

Coursesteach
4 min readApr 7, 2024

--

📚Chapter 4: Machine Translation and Document Search`

Introduction

In the last Blog you learned about locality sensitive hashing, now it is time to put all this knowledge to use. You’ll make an algorithm that computes the k nearest neighbors much faster than brute search.

Sections

Random Planes
Multiple set of Random Planes
Multiple set of Random Planes in Code

Section 1- Random Planes

so far. We’ve seen that a few planes, such as these three can divide the vector space into regions. But are these planes the best way to divide up the vector space? What if instead, you divided the vector space like this? In fact, you can’t know for sure which sets of planes is the best way to divide up the vector space. So why not create multiple sets of random planes. So that you can divide up the vector space into multiple independent sets of hash tables. You can think of it like creating multiple copies of the universe or a multiverse, if you will. You can make use of all these different sets of random planes, in order to help us find a good sets of friendly neighborhood vectors. I mean, a set of k nearest neighbor.

Natural Language Processing with Classification and Vector Spaces

Section 2- Multiple set of Random Planes

So back to our multiple sets of random planes, over here for instance, let’s say you have a vector space. And this magenta dot in the middle represents the transformation of an English word into a French word vector. You’re trying to find other French word vectors that may be similar, so maybe one universe of random planes. Helped us to determine that this magenta vector and these green vectors are all assigned to the same hash pockets. Another entirely different set of random planes, helped us determine. That these blue vectors are in the same hash pockets as the red vector. A third set of random planes, helped us determine that these orange vectors are in the same hash pockets as the magenta vector. By using multiple sets of random planes for locality sensitive hashing. You have a more robust way of searching the vector space, for a set of vectors that are possible candidates to be nearest neighbors. This is called approximate nearest neighbors, because you’re not searching the entire vector space but just a subset of it. So it’s not the absolute k nearest neighbors but it’s approximately the k nearest neighbors. You sacrifice some precision in order to gain efficiency in your search.

Natural Language Processing with Classification and Vector Spaces

so let’s see how to make a set of random planes in code. Assuming that your word vectors have two dimensions and you want to generate three random planes. You’ll use np.random.normal to generate a matrix of three rows and two columns.

Section 2- Multiple set of Random Planes in Code

Natural Language Processing with Classification and Vector Spaces

As you see here, you’ll create a vector v and for each random plane, see which side of the plane the vector is on. So you’ll find out whether the vector v is on the positive or negative side of each of these three planes. Notice that instead of using a for loop to work on one plane at a time, you can use np dot, to do this in one step. Let’s call the function, he result is that vector v is on the positive side of each of the three random planes. You’ve already seen how to combine these intermediate hash values, into a single hash value. As you see, locality sensitive hashing, allows two compute k nearest neighbors, much faster than naive search. This powerful tool can be used for many tasks related to r vectors and I will show you how it can be applied to search in the next Blog.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

🚀 Elevate Your Data Skills with Coursesteach! 🚀

Ready to dive into Python, Machine Learning, Data Science, Statistics, Linear Algebra, Computer Vision, and Research? Coursesteach has you covered!

🔍 Python, 🤖 ML, 📊 Stats, ➕ Linear Algebra, 👁️‍🗨️ Computer Vision, 🔬 Research — all in one place!

Enroll now for top-tier content and kickstart your data journey!

Natural Language Processing course

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

👉📚GitHub Repository

👉 📝Notebook

Ready to dive into data science and AI but unsure how to start? I’m here to help! Offering personalized research supervision and long-term mentoring. Let’s chat on Skype: themushtaq48 or email me at mushtaqmsit@gmail.com. Let’s kickstart your journey together!

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

To Do List

1- Collects Keys points from the blogs

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Classification and Vector Spaces

--

--