Hash Tables

Bilal Khan
4 min readJul 10, 2016

--

A hash table is a data structure used to implement an associative array, a structure that can map keys to values. This kind of data structure increases our efficiency to near O(1), or constant time. Hash tables combines the random access nature of arrays with the dynamic ability of a linked list.

Hash tables are very poor at sorting, so we only want to use it if sorting is irrelevant for that data. We will start with a simple illustration, and slowly build upon it step by step.

What is a Hash Table primarily composed of?

A hash table can be broken down into basically two things:
1. A hash function or a mapping function (returns a non-negative integer known as hash code, for simplicity we’ll call it index).
2. An array (the actual table where the data to be searched is stored)

A simplistic view of a Hash Table. A key is input to the Hash Function which outputs an ‘index’ or a hash code, then that ‘index’, key and value are given to the hash table to be stored in that location.

Likewise, to get a value of a key, the illustration would be:

Similarly, to get the value of a key, the simple illustration would be to get the index (or hash code) of the key from the hash function, then input that index and the key to the Hash Table which spits out its value

Let’s get started with the basic idea behind Hash Tables

The basic idea for a hash table is that we take a key, value pair; we run that key through the hash function; it processes the key, and yields a hash code (a non-negative integer). Then with that number (hash code) we store the data at that location in the array.

Hash Function

You don’t want to reinvent the wheel by writing your own hash function, there are plenty of efficient algorithms you can find on the internet. Here we will only study a basic hash function to simplify the concept and to fully understand the logic behind hash tables. A simple hash function would be a function which adds all the ASCII values of the input string, converts the generated sum to a range between 0 to size, and yields a number in that range.

A simple c language example would be:

unsigned int hash(const char *key, unsigned int size)
{
unsigned int i;
unsigned int hash = 0;
/*loop through all the chars*/
for ( i=0 ; key && key[i] ; i++ )
{
/*add all the ASCIIs*/
hash += key[i];
}
/*convert hash code to fit it inside the range*/
hash = hash % size;
return hash;
}

What makes up a good hash function?

  • Uses only the data being hashed
  • Uses all of the data being hashed
  • Is deterministic
  • Uniformly distributes data
  • Generates very different hash codes for similar data

Hash Tables Data Structure

Our Hash table structure and linked list structure would look something like this:

/*
* key: Points to the key string
* The key is unique in the HashTable
*
* value : The value corresponding to the key
* Value does not possess uniqueness.
*
* next : Points to the next node of that linked list
*/

struct List
{
char *key;
char *value;
struct List *next;
};

/*
* size : Total number of buckets
*
* buckets: Each bucket contains a linked list.
* Points to individual linked lists.
*/

struct HashTable
{
unsigned int size;
List **buckets;
};

Let’s select previously mentioned size of our Hash Table to be equal to 5. This is how our initialized hash table without any data would look like:

The illustration shows five unique buckets (pointers to different linked lists which will store data), the hash function will give us the index to the bucket, and then the data will be saved in that bucket in a linked list.

Now let’s take a few examples to store data in our hash table. First let’s assume a key, value pair of “key1” and “value1”.

  • key=“key1” and size=5 is passed to the hash function, and let’s assume the function returns a value of 3.
  • index=3, key=“key1” and size=5 are passed to the hash table’s save function, which will store the data at that index or in that bucket. So now our new updated hash table will look like:
Adding data in a hash table

Note that keys are unique, so the save function for the hash tables should first look in that bucket if the key already exists, if it does, it should replace the old value of that key with the new value; only if the key doesn’t mach any key existing in that bucket, it should create a new node in the beginning of the linked list. Now let’s populate the hash table to have a better picture of it:

Hash Table populated with data

If we were to use a simple array with the hash function, the resulting hash codes or indexes would have produced collisions, (different data can produce same hash code). The above method using linked lists, avoids collisions by a method called chaining. Other methods to avoid collisions include re-hashing, linear probing (results in clustering), quadratic probing, etc.

--

--