Hash Tables

Michael Verdi
4 min readJan 14, 2019

Hash tables utilize key-value pairs to store data. Because of their speed and utility almost every programming language ships with some type of hash table implementation. Hash tables in Javascript are called Objects, in Python its Dictionaries and in Java, Go and Scala its Maps. Hash tables came to exist because humans think in more than just numbers.

Arrays are great, but having to model all of your data based off indices is a nightmare. So, hash tables exist because being able to model your data like this passwords["gmail"] is way better than having to model it like this passwords[0].

But, have you ever wondered how this data structure is implemented under the hood? Computers don’t intuitively understand human readable strings as keys so how can they store values associated with keys?

Well, the short answer is that they don’t have to. We can create a hash function which acts as a translator. Humans understand words, computers understand indices. A hash function can take a word and converts it to an appropriate index. In this way, we can store data in an array and look it up via its index.

Conceptual Idea of Hash Function

So, hash tables make use of a hashing function. Creating a good hashing function is tall order that requires a bit of math. The important points of a hashing function are:

  1. It’s fast (Big O constant time)
  2. It doesn’t cluster outputs at specific indices, but spreads them uniformly
  3. It’s deterministic — same inputs yields the same outputs

Also, it turns out that prime numbers are very important to hash functions. By creating an array that is the length of a prime number and by including prime numbers in the hashing calculation you can greatly improve the uniform distribution of data and avoid many collisions. Why this is, i’m cannot entirely explain, but this guy attempts to: https://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/

Collisions

Now, regardless of how good the hashing function is, there are bound to be collisions. Collisions are when multiple inputs yield the same output index. There are two main strategies to deal with collisions, the first is called separate chaining and the second is called linear probing.

Separate chaining utilizes another array or linked list to store multiple values at the same index. Take a look at the image below. Both keys darkblue and salmon resulted in the same index being generated. Instead of overwriting the value at index 4, we store both in an array.

How Separate Chaining Works

In linear probing, when a collision occurs, we find the next available spot in the array to place the key-value pair.

How Linear Probing Works

Example Code Implementation

The below code is proof of concept and is very simplistic. The hashing function used only works with strings. It loops through each character, finds it UTF 8 encoding and then subtracts 96 which yields its alphabetical order. We then multiple the value by a prime number and add it to our subtotal. Once all the characters in the string are computed, we find the remainder of the total divided by our storage array length. In this way, we can keep our output from every yielding an index which is outside of our array.

class HashTable {
constructor(size=53){
this.keyMap = new Array(size)
}
_hash(key){
let total = 0
let WEIRD_PRIME = 31
for (let i=0; i < Math.min(key.length, 100); i++){
let char = key[i]
let value = char.charCodeAt(0) - 96
total = (total * WEIRD_PRIME + value) % this.keyMap.length
}
return total;
}
set(key, value){
let hashedKey = this._hash(key)
if (!this.keyMap[hashedKey]) this.keyMap[hashedKey] = []
this.keyMap[hashedKey].push([key, value])
return this;
}
get(key){
let hashedKey = this._hash(key)
if (this.keyMap[hashedKey]) {
let found = this.keyMap[hashedKey].map((value) => {
if (value[0] === key) {
return value[1]
}
})
return found[0]
} else {
return undefined
}
}
keys(){
let keys = []
for (let i=0; i < this.keyMap.length; i++){
if (this.keyMap[i]){
for (let j=0; j < this.keyMap[i].length; j++){
if (!keys.includes(this.keyMap[i][j][0])){
keys.push(this.keyMap[i][j][0])
}
}
}
}
return keys
}
values(){
let values = []
for (let i=0; i < this.keyMap.length; i++){
if (this.keyMap[i]){
for (let j=0; j < this.keyMap[i].length; j++){
if (!values.includes(this.keyMap[i][j][1])) {
values.push(this.keyMap[i][j][1])
}
}
}
}
return values
}
}

--

--