Hash Tables

Joe Lorenzo
Nov 15, 2019 · 8 min read
Image for post
Image for post
Photo by Katie Smith on Unsplash

There are many ways to store data in programming, one of them is known as a hash table. Javascript has its own variations of hashing, like the new Map functionality that was introduced with ECMAScript 2015 or ES6, which creates an associative array and remembers the insertion order. In a hash table, as opposed to a typical array, however, the keys are put through a hashing function, and then a numerical value is assigned to the data, and it’s placed in that location in the table, set inside what’s called a bucket. The bucket that it is inserted into can contain only one associative array at a time before there is what is called a collision. There are multiple ways of handling a collision.

One is called separate chaining, in which the array that one bucket is in is expanded to fit more arrays leading to something that resembles (or could actually be) a singly linked list, or an array of arrays depending on the implementation. This only occurs if the same indexed bucket is returned for multiple items after we pass our items through the hashing function.

Let’s suppose there is an associative array in bucket 007, if a second array is introduced because of the output of the hash function is the same as the first. It will become a multidimensional array that can house another pair, so now bucket 007 will have 2 associative arrays set within the same bucket. This method is known as open hashing. A closed hashing technique would be something like linear probing. With linear probing, if a bucket is full then the next one will be used to store the information. If the sequential one is in use it will continue running until it finds an open bucket and places it there.

Let’s talk about opened vs closed hashing. It’s called open hashing because with separate chaining there is a list that is introduced and the hash table becomes open to more than just buckets. A linked list or similar structure is placed in buckets so if a collision occurs it can just point to the next node or index. With closed hashing, the table is always intact and the only information introduced into the buckets is the associative array, and there is no need to introduce any other types of data. So what are the hash functions that determine which bucket is used? That’s completely up to the person writing the function. Hash functions can be simple or complex, but as the table expands you want to make sure that you have as few collisions as possible.

Let’s say you want to save your shopping list. You know you need eggs, but you always forget how many to buy, do you get one dozen or four? The easiest thing to do is associate eggs, and how many dozens, and put them together so that eggs points to the amount needed like so.

"eggs" => "1 dozen"

But for some odd reason, you want to do this from scratch instead of using one of the pre-existing methods for storing this. How would you accomplish this? For this we’ll use an array as the table we insert data into. We will pass the string “eggs” through a hashing function, and the output will be the index of where this key will be stored. The hash function when given key multiple times, should always point to the same place in the array. So if “eggs” is stored in an index of 1, the next time I type eggs into the hash function, I should get that same index. This is called “deterministic”. If you don’t get the same value back then something has gone wrong with the hashing function, because the output is determined(deterministic) by the input. So what does a simple hashing function look like? Again, hashing functions vary in complexity, and there are many different things you can incorporate into a hash function. Keep in mind, you want there to be some variation in the output so that different buckets are selected more frequently than not. Here is a very simple hash function.

class HashTable {
constructor(size=10) {
this.map = new Array(size);
}
_hash(key) {
let total = 0;
for (let i = 0; i < key.length; i++) {
const value = key[i].charCodeAt(0) - 96;
total += total + value;
}
return total % this.map.length;
}
}
const hashTable = new HashTable();
console.log(hashTable._hash("eggs")); // 1
console.log(hashTable._hash("eggs")); // 1
console.log(hashTable._hash("bacon")); // 6
console.log(hashTable._hash("bacon")); // 6

This function will be used to hash our keys and assign them a number. I’ve assigned it to a class called HashTable so that we can use it again and again. The underscore in front of the hash (key) denotes that it is a private function. As you can see, if “eggs” is the input, it gives the same output; the same for bacon. This is a good indication that the hash function is working well. So now that the hash function is working, it’s time to get started on setting the key-value pairs so that we can remember how many we need for our recipe. The method will be called set, and the parameters will be a key and a value. This will map the value to the bucket that is assigned to the key and it will be stored there for later retrieval. For example, when we put in eggs as a key, with the value of a dozen, it should lead us to a value of 1, which is the index of the array that the bucket is located in. So here goes the set method for the table and its results.

Image for post
Image for post

For the set method on the hashTable, we pass a key and a value. The function takes the key parameter and passes it through the hashing function that we created earlier. This will give an index of 1, and if there is not an array at the index of 1, then it will set one. Remember that this is one way of doing it. We’re using separate chaining, instead of linear probing, so if an array does already exist in that spot, we just create a new array and push it to create a nested array in that index. When hashTable is logged to the console after setting [“eggs”, “1 dozen”] it will look like this.

console.log(hashTable);
// [[],["eggs", "1 dozen"], [], [], [], [], [], [], [], []]

Now I know what you must be thinking. Why in the world would anybody want to make a table and fill it with arrays, when I can just store a key-value pair in a native JavaScript object? Let’s set a couple more objects, and we’ll see how this could be the most viable solution for data retrieval when dealing with a large number of pairs.

Image for post
Image for post
Outcome after inserting multiple items

Now that there are a few objects in here, we see that for the most part — each key-value pair has their own bucket; eggs are in the bucket indexed at 1, bacon is in the bucket indexed at 6, and we’ve encountered our first collision at index 7, where both cherry coke zero (fiend for the stuff) and hot sauce reside. In this particular article, we’re resolving this issue by pushing into the existing bucket (separate chaining). Now that we have all these items indexed, the only thing we’d have to do to retrieve the associative array is input the key, and we know exactly which bucket to look in because it will be the same as when we set it! So let’s get the value of an associative array based on the key.

Image for post
Image for post
get method for hashTable

This is the last method that we will need for the hash table class in this article. I know it’s been a lot but let’s walk through this quickly to understand how the get method works. When passed the string “eggs” as a parameter, what happens? Well, the index is set to the value given by our hash function, which we know to be 1 for eggs, then if the index is within our range, execute for loop. The for loop finds the array at our given index, so it would be

for (let i = 0; i < this.map[1][i].length; i++) {}

We are in the second dimension of the array, where the array we’re looping over is the determined index, for eggs, this only has one element nested away inside the bucket. The conditional states, if the first indexed item, which is our key, is the same as the passed in parameter to the get method, return the second index of that array which is its value.

if (this.map[1][0][0] === "eggs") {
return this.map[1][0][1] // which is "1 dozen"
}

This will work for all strings, if we want to retrieve how much bacon we need, all we need to do is pass that(bacon as the key) to the get method, and if the key exists within one of the buckets it will return its value indefinitely.

Well, it still hasn’t really been said why you would choose a hash table over any other native functionality. Well, there can be a few use cases, let’s say you wanted to save a ton of key-value pairs, but you don’t want to just push to any old array. You don’t want to because the lookup will be done linearly if you’re unsure of where it is; you’ll be searching the array until you find the key, and if you have a million key-value pairs, that could take a while. However, if you implement this, and you had 1000 buckets, now you know exactly which of those 1000 buckets you have to look in to find your value and you’ve just cut the time down considerably. Of course, there will be collisions within those 1000 buckets and probably all of them won’t be used or evenly distributed unless there is some magical perfect hash function, but it still works very well. The average times for lookup, insertion, and deletion are all O(1) or constant time, worst-case scenarios are O(n) or linear time, which would be the case if all the pairs you input were all dumped in the same bucket, but if you set up a good enough table and hash function you shouldn’t encounter the worst case. So there you have it, a new way to save your grocery list in your very own hash table.

As always I’d like to thank Colt Steele for teaching these algorithms. If you haven’t checked out his Udemy data structure course, definitely give it a look, some other great links for learning about hashes are
[1]: Hash Tables | Data Structures in JavaScript — https://www.youtube.com/watch?v=QuFPIZj55hU by Beiatrix on Youtube
[2]: JavaScript Algorithms and Data Structures Masterclass — https://www.udemy.com/course/js-algorithms-and-data-structures-masterclass — by Colt Steele.
[3]: VisualAlgo — https://visualgo.net/en/hashtable .

The Startup

Medium's largest active publication, followed by +733K people. Follow to join our community.

Joe Lorenzo

Written by

Car lover, hiker, full-stack engineer, who loves the thrill of climbing a mountain or writing well thought out code.

The Startup

Medium's largest active publication, followed by +733K people. Follow to join our community.

Joe Lorenzo

Written by

Car lover, hiker, full-stack engineer, who loves the thrill of climbing a mountain or writing well thought out code.

The Startup

Medium's largest active publication, followed by +733K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store