Natural Language Processing (Part 36)-Locality sensitive hashing

7 min readMar 24, 2024

📚Chapter 4: Machine Translation and Document Search`

Introduction

A pivotal technique for minimizing computational expenses when identifying neighbors in high-dimensional spaces is locality-sensitive hashing. This tutorial will elucidate the concept of hashes and their applications.”

In the vast realm of machine translation, where the goal is to bridge the linguistic gap between languages, the pursuit of efficiency and accuracy remains perpetual. With the ever-growing volume of digital content across multiple languages, the demand for high-quality translation systems has escalated. Amidst this challenge, technological innovations continually emerge to refine translation processes. One such innovation making waves in the field is Locality Sensitive Hashing (LSH).

Locality Sensitive Hashing, often abbreviated as LSH, is a technique originating from the field of computer science and data mining. Initially developed to solve the problem of approximate nearest neighbor search efficiently, LSH has found applications in various domains, including machine translation.

Sections

Locality sensitive Hashing
Planes
Planes in two dimensions
Planes in three dimensions
Which side of the plane
what’s the dot products doing.

Section 1- Locality sensitive Hashing

Natural Language Processing with Classification and Vector Spaces

To start thinking about locality sensitive hashing, let’s first assume that you’re using word vectors with just two dimensions. Other pits each vector as a circle instead of arrows. So let’s say you want to find a way to know that these blue dots are somehow close to each other, and that these great thoughts are also related to each other. First divide the space using these dashed lines, which I’ll call planes. I’ll explain why I call them planes in a bit. Notice how the blue plain slices up the space into vectors that are above it or below it. The blue vectors all happen to be on the same side of the blue plane. Similarly, the gray vectors happen to be above the great plain. It looks like the planes can help us pocket the vectors into subsets based on their location, this is exactly what you want.

A hashing function that is sensitive to the location of the items that it’s assigning into buckets. You’re working your way towards locality sensitive hashing.

Section 2- Planes

Now, let’s see why I’m calling these dash lines planes. A plane would be this magenta line into two dimensional space and it actually represents all the possible vectors that would be sitting on that plane. In other words, they would be parallel to the plane such as this blue vector or this orange vector. You can define a plane with a single vector, this magenta vector is perpendicular to the plane and it’s called the normal vector to that plain. The normal vector is perpendicular to any vectors that lie on the plane

Section 3- Planes in two dimensions

Section 4- Planes in three dimensions

It might help to think about this in three dimensions, find a sheet of paper and find a pencil. Place the paper on the table and draw some vectors on it, then hold the pencil vertically over the paper. Any vectors on the paper are perpendicular to the pencil.

Let’s go back to two dimensions, you are able to see visually when the vector is on one side of the plane or the other.

Section 5- Which side of the plane

But how do you do this mathematically? Here are three sample vectors in blue, orange and green, the normal vector to the plane is labeled P.

Let’s focus on vector one, what if you take the dot product of P with vector one, you get three. I’ll explain in a bit why you’re doing this. Now,

Let’s look at Vector two, if you take the dot products of P with vector two you get zero.

Finally, Let’s look at the extra three, if you take the dot product of P with vector three, you get negative three.

So the dots products are three, zero, and negative three.

Do you notice something about the science and how they’re related to their position relative to the red plane?

When the dot product is positive, the vector is on one side of the plane.
If the dot product is negative, the vector is on the opposite side of the plane.
If the dot product is zero, the vector is on the plane,

Section 6- what’s the dot products doing.

To visualize the dot product, imagine one of the vectors such as P as if it’s the surface of the earth. Gravity pulls all objects straight down towards the surface of the earth, next pretend you’re standing at the end of the vector V1. You tie a string to a rock and let gravity pull the rock to the surface of vector P, the string is perpendicular to vector P. Now, if you draw a vector that’s in the same direction of P, what ends up at the rock, you have what’s called the projection of vector V1 onto vector P. The magnitude or length of that vector is equal to the dot product of V one and P.

Furthermore, if you had this other green vector and projected it onto vector p, the projected vector would be pointing in the parallel but opposite direction of P. The dot product would be a negative number.

This means that the sign of the dot product indicates the direction of the projection with respect to the purple normal vector. So whether the dot product is positive or negative can tell you whether the vector V1 or V2 are on one side of the plane or the other.

Let’s use code to check which side of the plane a vector is on. The function side of plane takes in the normal vector P and a vector V. Use np dot to take the dot product, use np dot sign to get a plus one, if the dot product is positive. -1 if the product is negative or zero if the dot product is zero, I’m using np dot as scalar. Notice the pronunciation of that function, if a vector can be represented as a single scalar, dysfunction retrieves that scalar and that’s it. Please try it out for yourself, that was a lot of visualizations and the lots of projections. The main takeaway is that the sign of theprojection of two vectors tells you which parts of the line the point lies. For example, above it or below it. In the next tutorial, you will learn how to combine this concept with multiple planes to try to better approximate where a data point might be located.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

To Do List

1- Collects Keys points from the blogs

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter