Data Structures for Dummies: Hash Tables

Breakdown of what hash tables are and how to code one

Ramy Zhang
7 min readJul 6, 2018

Usually when I think about hash tables, I see them as arrays on steroids. It’s a versatile data structure that can solve many kinds of problems, and it can also store a huge variety of data types and quantities super efficiently.

What is it really?

A hash table is basically an array that links a key to a specific data value. For example, say we want to create a database that stores the profile information of all users of a website; in that case, the keys would be the names of the users and the values would be their profile information, including things like their date of birth, e-mail address, nationality, etc.

But how do we actually go about placing this whole system on top of an array? The indexes of the slots in an array are whole numbers, not usernames, right?

We use hash functions to turn the keys into integers. They’re basically magic number machines that can take in information of any length and spit a number of fixed size back out. For example, let’s say my username on the website from the previous example was “ramyzhang”. We would input that string into a hash function, and the function will do a bunch of calculations to spit out a number of, say, 5 digits, or a number that only goes up to a maximum of 10 (depending on what kind of hash function it is). This number will always be associated to the string “ramyzhang”, no matter how many times you put “ramyzhang” through the function.

Then, since the resulting hash is an integer, you can just go and directly use that number as the index for the profile information of user “ramyzhang”.

The chaining method

Let’s just take a quick pause think about hash functions for a second— since we’re converting data of any length (x) into data of fixed size (y), dataset x will be way larger than dataset y. There are infinite entries in dataset x, but limited ones in dataset y because all the numbers have to be a certain size or length. In other words, there could possibly be two or more pieces of data in dataset x that will hash to the same integer in dataset y.

This is called the collision problem, and here’s what you can do to deal with it:

Source

When two or more pieces of data collide and hash to the same integer (index), we could simply link the data up into a linked list at that index of the array, as you can see above.

Of course, this is the kind of situation we’re going to want to avoid, because instead of using constant time O(1) to find a key and its corresponding information in the array, we end up needing linear time O(n) to actually walk through the linked lists to find our actual values.

The other kind of collision. Both are bad.

In general, if you have a good hash function, collisions should happen less often. Let’s look at other kinds of solutions to the collision problem.

The linear probing method

Since we really really want to avoid throwing collided data into a linked list, another way to deal with collisions is to kinda walk down the array from the collided index until you get to the next slot that’s free.

Let’s do this with a concrete example this time.

I put together the following mini dataset of book IDs that represent each book’s placement in a library:

10 random book IDs

Okay, so now we need a hash function to map all of these to an array. Since the dataset has a length of 10, we’re going to just make the hash function x mod 10 = y. (Make sure you research to use a better hash function if you actually want to build a good hash table, this is just for the sake of an example!)

Now I’m going to hash all of these keys so they give me integers that will act as indexes with a total length of 10.

Wow, that’s a lot of collisions. Before we try to fix that, let’s quickly reorganize the dataset so it’s in chronological order:

Re-organized dataset

OK, all good! Let’s try and apply the linear probing method now. What we want to do is take the data that has collided and walk the other one down the array until it finds the next available slot. So the key 245 will now have an index of 6 instead of 5, and the key 8 will have an index of 10 instead of 8. Let’s take a lot at our fresh new indexes:

Properly filled array

Looks pretty good! So that’s the linear probing method. Now if you want, you could subtract 1 from all of these guys (since arrays begin with an index of 0) or even just keep it as is and have an array with 11 slots.

Code it!

Now that we have a basic idea of both the chaining method and the linear probing method, let’s build a hash table with the linear probing method! We want to get a couple of things down: first, we’re going to define our raw data with the key-value pair, then we’re gonna detail the hash function. Afterwards, we’re going to create three basic methods to search the hash table, insert new entries, and delete them.

Let’s start by making a struct with the key-value pair. We’re going to make this following the library example from earlier. (This is written in C. Code referenced from TutorialsPoint). Let’s make the values associated with each book ID the serial code of the book.

struct BookInfo {
int ID
int SerialCode;
};

Alright! Let’s move onto the hash function itself.

int hashCode(int ID){
return ID % 10;
}

The number 10 would change depending on the size of your original dataset. Since we’re just going to be following along the previous example, we’re using key mod 10 as our hash function.

Now let’s implement a search method. We want to get the hash of the ID of the book we’re searching for so we can obtain its corresponding index in the array. Then we’re going to search the array for the corresponding index, and compare the values in that slot of the array. If it matches, bingo! If it doesn’t, then according to the linear probing method we’re just gonna keep plodding through the array until we find a match.

struct BookInfo *search(int ID) {
//here we're getting the hash of the ID for the book we want
int hashedIndex = hashCode(ID);

//walk through array until you find an empty slot
while(hashArray[hashedIndex] != NULL) {
//seeing if the value corresponds and returning value if yes
if(hashArray[hashedIndex]->ID == ID)
return hashArray[hashedIndex];

//move towards the next slot in the array if not
++hashIndex;

//wrap around the hash table
hashedIndex %= 10;
}

return NULL;
}

Next, we’re going to implement an insert method! This will allow us to add new books to our mini library. We’re going to start by computing the hashed index of the new book’s ID. Then, we’ll check whether there’s already data in that slot of the array; if it’s still empty, then we’ll just insert the new book in that slot. If it’s occupied, then we’ll continue our linear probing method through the array to find the next free slot.

void insert(int ID,int SerialCode) {
struct BookInfo *book = (struct BookInfo*) malloc(sizeof(struct BookInfo));
book->SerialCode = SerialCode;
book->ID = ID;

//getting the hash of the new book's ID
int hashedIndex = hashCode(ID);

//moving through array until we find an empty cell
while(hashArray[hashedIndex] != NULL && hashArray[hashedIndex]->ID != -1) {
//going to the next cell
++hashedIndex;

//wrapping around the hash table
hashedIndex %= 10;
}

hashArray[hashedIndex] = book;
}

Lastly, we’ll implement a delete method as well. In order to delete an item from our hash table, we actually want to compute the hashed index of the book ID that we want to remove, search for that through the array (and with the linear probing method if required), and then replace it with a dummy item in order to make sure our hash table will still function properly after the deletion.

struct BookInfo* delete(struct BookInfo* book) {
int ID = book->ID;

//getting the hash of the book we want to delete
int hashedIndex = hashCode(ID);

//moving through the array until we find an empty slot
while(hashArray[hashedIndex] != NULL) {

if(hashArray[hashedIndex]->ID == ID) {
struct BookInfo* temp = hashArray[hashedIndex];

//placing a dummy item in deleted slot
hashArray[hashedIndex] = dummyBook;
return temp;
}

//going to the next cell
++hashedIndex;

//wrapping around the hash table
hashIndex %= 10;
}

return NULL;
}

Now we’re pretty much complete!

Some advice I hear a lot is that for any problem you’re solving, especially in coding interviews, you’re going to want to keep hash tables in mind as a possible solution because that’s just how versatile they are. So, feel free to continue your research beyond just this simple tutorial to learn more about different types of solutions to collisions (for example, the quadratic probing method) and more complex hash functions!

Thank you for reading! If you found this helpful, here’s some next steps you can take:

  1. Send some claps my way!
  2. Follow me on Medium and connect with me on LinkedIn to be updated when next article of this series is out!
  3. Get some more practice by doing some problems with hash tables!
  4. Check out the previous article in this series on stacks and queues!

--

--