Natural Language Processing (Part 34)-K-nearest neighbors

Coursesteach
4 min readMar 10, 2024

--

📚Chapter 4: Machine Translation and Document Search

Introduction

One key operation needed to find a matching word in the previous tutorial was finding the k nearest neighbors of a vector. We will focus on this operation in the next few tutorial, as it’s a basic building block for many NLP techniques.

Sections

Finding the translation
Nearest neighbor
Hash Table

Section 1- Finding the translation

Notice that a transform or the vector after the transformation, if it’s embedding through an R matrix, would be in the French word vector space. But it is not going to be necessarily identical to any of the word vectors in the French word vector space. You need to search through the actual French word vectors to find a French word that is similar to the one that you created from the transformation. You may find words such as salut or bonjour which you can return as the French translation of the word hello.

So the question is, how do you find similar word vectors? To understand how to find similar word vectors, let’s look at a related question.

Section 2: Nearest neighbor

How do you find your friends who are living nearby? Let’s pretend that you are
visiting San Francisco in the United States and you’re visiting your dear friend Andrew. You also want to visit your other friends over the weekend, preferably those who live nearby. One way to do this is to go through your address book, and for each friend, get their address, calculate how far they are from San Francisco. One friend is in Shanghai, the other friend is in Bangalore, and another friend is in Los Angeles. You can sort your friends by their distances to San Francisco, then rank them by how close they are. Notice that if you have a lot of friends, which I’m sure you do, this is a very time intensive process. Is there a more efficient way to do this? Notice that two of these friends live in another continent, while the third friend lives in the United States. Could you have just searched for a subset of friends wholive in the United States.

Section 3- Hash Table

You might have realized that it may not have been necessary to go through all of your friends in your address in order to find the ones closest to you. You might have imagined if you somehow could filter on which friends were all in a general region such as North America, then you could just search within that subgroup of friends. If there is a way to slice up the geographic space into regions, you could search just within those regions. When you think about organizing subsets of a data-set efficiently, you may think about placing your data into buckets. If you think about buckets, then you’ll definitely want to think about hash tables. Hash tables are useful tools for any kind of work involving data. You’ll learn about hash tables next. In this tutorial, I showed you how using K-nearest neighbors, you could translate a word even if it’s transformation doesn’t exactly match the word embedding in the desired language. I introduced you to hash tables, a useful data structure that you will learn
about in the next tutorial. Great. Now you are ready to learn about hashing. This is an effective technique that would allow you to look up queries in a much faster way than simple linear search.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

To Do List

1- Collects Keys points from the blogs

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Classification and Vector Spaces

--

--