So you want to implement a hash table?

Richard Rosier
The Startup
Published in
4 min readMay 4, 2020

If you’ve found this page, you’re likely a beginner coder who is starting to learn the ropes of data structures. I’m going to try to keep things brief while still explaining key concepts, and I’ll be focusing on Javascript implementations and explanations. If you know what a hash table is already and are just looking for code, just skip to the bottom :)

What is a hash table?

Programming languages use hash tables to store keys and values. If you’re a Javascript user, you know a hash table already: objects {}. Other programming languages refer to them as maps or dictionaries. Hash tables use arrays, or lists, internally to organize data, and offer advantages over arrays in certain areas, like data retrieval and insertion.

Hash tables work by storing keys and values as an array inside buckets, which are arrays of arrays. Confused yet? Check this out:

We have three key value pairs in this table. Two of them are stored in bucket 1, one is stored in bucket 2, and bucket 3 is empty.

In code, this would look like this, but without the extra whitespace:

[ [ [key,value], [key,value] ], [ [key,value] ], [ ] ]

By separating the key/values into buckets, we can cut down on our retrieval time, which I will get into later. Right now, I want to explain about why they are called hash tables.

Consider how you access data from an array vs. from an object. Arrays use an index, objects use a key. But if objects are hash tables and hash tables are arrays, how can you access the information without an index? Enter the hash function! Hash functions have a wide range of complexities but the gist is that they take an input and convert that input to a number, and that input will always produce the same number, called a hash. So every time we input the string of ‘cat’ it will always return 12, for example. Here is a little bit more info on them. For hash tables, the hash function will typically take an input of the array’s length and perform a modulo (%) operation to prevent a hash that is longer than the array’s length.

Our given key is passed in through a hasher function, which will generate a number, or hash. That number is then used as the index for placing the key and value into a bucket. Depending on how advanced your hashing function is, you will likely encounter two keys that generate the same hash.

When you put a key/value into a bucket that already has something in it, it’s called a collision. Collisions can be avoided by using a better hashing function and increasing the number of buckets (the size of your hash table array). Well implemented hash tables will resize themselves to maintain an optimal amount of buckets. Too many buckets are a waste of space, while too few buckets create more collisions. Divide the number of keys by the number of buckets; hash tables work best when that number is between .25 to .75.

Why do we need hash tables?

The reason we have both hash tables and arrays is that by storing data in different ways, they perform operations at differing speeds. Consider the time it takes to add, delete, or find something in a hash table vs. an array.
(If you are unfamiliar with Big-O time complexity, O(1) is constant speed, O(n) is linear and grows as the size of the inputs, n, grows.)

By grouping key/values into buckets, we can search, add, and delete things from the object much faster than if we had put those values into an array.

Key takeaways of the differences between hash tables and arrays:

  • Search on an array is O(n) while on a hash table is O(1).
  • Arrays’ keys (index) are always a number 0 or greater. They can have any type of value, and values can be duplicated.
  • Hash table’s keys can be anything, as long as your hasher can handle them and each key is unique. Values can be any type and can have duplicates.

A hash table implementation

Here is a basic, mostly working has table. I say mostly working because it doesn’t automatically resize itself or overwrite keys. Can you figure out how to make it do that? If you can’t, I recommend you go visit Adrian Mejia’s guide on data structures. It’s very in depth, and you can see an optimized hash table.

Ultimately, hash tables are another tool that allows us to handle data in different ways. Using them solves some problems and creates others, and knowing when to use a hash table vs an array is part of the knowledge gained when learning to code. If you’re like me, you’ve used objects in javascript without even thinking about what’s going on under the hood, and I hope this has allowed you to understand how they work a little better.

--

--