Implementing a Hash Table in C++14 for practice
I recently had a job interview in which I was asked how I would implement a hash table. I’ve used hash tables a lot in my code, and I have a understanding of how they work from my ADT module in University; however, I’ve never actually implemented one. This led me to only being able to explain generally how a hash table works in response to the interview question instead of giving solid implementation details.
To fill this gap in my knowledge, and for fun, I decided I would implement a hash table for once. In this post I’ll write about my first attempt at implementing a hash table, and then my second attempt in which I try to make it cache-friendly and fast.
First Attempt: Vector of Linked Lists
In my first attempt I only cared about how could get the hash table working quickly, and less about how I should implement it. In my second attempt I actually try to implement the hash table well. Skip to that section if you want. At the time of the first attempt, the only implementation of a hash map that I could think of was using linked lists as buckets to handle hashing collision; so first I implemented a linked list. First up is the templated ListEntry class which is will be the objects stored in the list.
It’s a simple class which just stores the value (our key-value pair object shown later), and a pointer to the next ListEntry in the list. Next is the templated LinkedList implementation
A pretty standard implementation of a linked list. We track the end node so that we can do quick insertions just by inserting at the end and then moving the pointer. On to the actual hash map implementation.
I created a HashEntry class to store our key-value pair. I also override the == operator to make the LinkedList::find method work when comparing HashEntry’s as seen in this line `if (current->getValue() == value)`.
Here we can see our table is a std::vector of LinkedLists which store HashEntrys as their values in the ListEntry class. To initialize this table, I reserve some memory for the vector and then push back HashEntrys with NULL key-value pairs. For our hashing function, I used std::hash modulo the vector’s size. In this implementation I only wrote a get() and put() method, and there is no rehashing/resizing methods. This means as you continue to use the hash table, the linked lists will just get larger and larger as new values get put into the same buckets, increasing the get() time considerably.
Second Attempt: Vector of Key-Value Pairs
After a different interview the same week in which I talked about cache-friendly code, and having watched a bunch of Chandler Carruth’s great videos again on C++ and code performance, I decided to try and implement a hash table again. This time I had just a single class — the hash table. Instead of buckets of linked lists, I went for a std::vector of std::pairs which stored the key-value pair. To handle collisions, I used open addressing with linear probing. This hopefully meant that with the contiguous memory of a vector, and the locality of values using linear probing, the data would often be found in the cpu’s cache.
In the resizeTable() function I always double the table’s size to create an exponential increase. This reduces the amount of expensive resizes that we will do in the long run. To further reduce the amount of resizes the user can pass in their own initial starting size too.
The slightly more complete code can be found on my Github. In the future, when I get some more time, I want to use Google’s benchmark library to compare my first and second implementations. If I do, it will be added to this post.
