Hash Tables: A Simple Javascript Example

A solid understanding of data structures is critical for building DRY scalable apps. This article is the first in a series that examines what’s going on under the hood in common data structures. To encourage granular comprehension examples are illustrated without native javascript methods.

Hash Tables

Hash tables optimize storage for key-value pairs. In best case scenarios hash table insertion, retrieval and deletion are constant time. Hash tables are used to store large amounts of quickly accessible information like passwords.

Anatomy of a Hash Table:

This is a basic Javascript hash table implementation. A hash table can be conceptualized as an array holding a series of tuples stored in sub-arrays inside of an object:

{ [[ [‘a’, 9], [‘b’, 88] ],[ [‘e’, 7], [‘q’, 8] ],[ [‘j’, 7], [‘l’, 8] ]] };

The outer array holds a number of buckets (sub-arrays) equal to the max length of the array. Inside the buckets, tuples or two element arrays hold key-value pairs.

When key-value pairs are inserted into the hash table, the key is hashed with a hashing function. The key and the maximum length of the array are passed to the hashing function which returns an index used to identify the bucket.

Given the same input, a hashing function will always provide the same output. For example if I were to hash the string ‘bee’ for an array with a max length of 5, I would get 2 every-time. A good hashing function will distribute input evenly across all buckets.

In best case scenarios insertion happens in constant time. The insertion function hashes the key to produce the bucket index, and then iterates through the bucket to make sure that the key doesn’t already exist. Keys must be unique in hash tables otherwise retrieval of all values is not possible. In a best case scenario, the bucket is empty and insertion is constant. In a worst case scenario if the hashing function has poor entropy and the keys are similar, inputs get clustered into a single bucket and insertion nears a linear time operation.

Increasing the size of a hash table decreases the chance of tuples being stored in the same bucket and improves lookup time, but uses more memory. A smaller hash table is more space efficient, but increases the chance of collisions where values are hashed to the same bucket. The optimum hash table is between 25% and 75% full.

Although our example is in Javascript it’s worth noting that Javascript arrays lengths are mutable and behave similarly to hash tables. In many languages when instantiating an array, the length must be declared and is immutable. If you create an array with a maximum length of 8, once it is instantiated, it is impossible to make the array bigger. The _storageLimit method is included in the code below so that our Javascript array will function like an array in a strongly typed language.