Natural Language Processing (Part 35)-Hash tables and hash functions

Coursesteach
5 min readMar 17, 2024

--

📚Chapter 4: Machine Translation and Document Search`

Introduction

In this Tutorial, you are going to learn about hash tables and hash functions. For example, imagine that you have a cupboard with several drawers. You might want to place similar objects in similar drawers. For example, paper documents alone, keys alone, and books alone. In this tutorial, we will dive deeper into these concepts.

Sections

Hash table
Hash Function
Create basic hash table
Locality-sensitive hashing

Section 1- Hash table

Natural Language Processing with Classification and Vector Spaces

Let us say you have several data items and you want to group them into buckets by some kind of similarity. One bucket can hold more than one item, and each item is always assigned to the same bucket. The results would be these blue ovals end up in bucket number 1, these gray rectangles end up in bucket number 2, and these magenta triangles are assigned to bucket 3.

Natural Language Processing with Classification and Vector Spaces

Section 2- Hash Function

Now, let’s think about how we’d like to do this with word vectors. First, let’s assume that the word vectors have just one dimension instead of 300 dimensions so each word is represented by a single number, such as 100, 14, 17, 10, and 97. You need to find a way to give each vector a hash value which is a key that tells us which bucket it’s assigned to. A function that assigns a hash value is called a hash function. In this example, here is a hash table which is a set of buckets. In this case, the ha table has 10 buckets. Notice how the word vectors 100 and 10 are assigned to bucket 0. The word vector 14 is assigned to bucket 4 and the word vectors 17 and 97 are assigned to bucket 7. Do you notice a pattern? This formula here is the hash function that’s being used to assign the word vectors to their respective buckets. The modulo operator takes the remainder after dividing by 10. The remainder is the hash value that tells us where the word vector should be stored. For example, 14 divided by 10 has a remainder of 4 so it goes to bucket 4.

Natural Language Processing with Classification and Vector Spaces

Section 3- Create basic hash table

Now, let’s build a basic hash tabling code. Here’s a definition of a function that takes in a list of values. You can think of each value as a one-dimensional vector. It also takes in the number of buckets. Define the hash function used in the modulo operator. Then you create the hash table. Notice that this is a dictionary comprehension. The key is an integer and the value is an empty list which you’ll use as a bucket for storage. For each word vector, calculate its hash value, then append it to the appropriate list. The hash table that is returned can be seen in the notebook that goes with this lecture. You’ll see that the hash table is the same as what you saw in the previous slide.

Natural Language Processing with Classification and Vector Spaces

Section 4- Locality-sensitive hashing

Now let’s take another look at this basic hash table. Recall that your original goal was to put similar word vectors into the same bucket. But here, it doesn’t look like numbers that are close to each other are in the same bucket. For instance, 10, 14, and 17 are in different buckets. Ideally, you want to have a hash function that puts similar word vectors in the same buckets, like this. To do this, you’ll need to use what’s called locality-sensitive hashing. Locality is another word for location. Sensitive is another word for caring. So locality-sensitive hashing is a hashing method that cares very deeply about assigning items based on where they’re located in vector space. You’ll learn about locality-sensitive hashing next.

Natural Language Processing with Classification and Vector Spaces
Natural Language Processing with Classification and Vector Spaces

Summary

You learned a lot of new terms. You learned about hash values, hash functions, and buckets. You have also seen the code for how to build a hash table or the equivalence of the cupboard I spoke about in the intro of the video. In the next tutorial, we will look locality-sensitiv hashing.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

To Do List

1- Collects Keys points from the blogs

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter

Source

1- Natural Language Processing with Classification and Vector Spaces

--

--