Performance of Hash Implementations on Various Workloads
Overview
This blog compares a couple of different ways of dealing with hash collisions in hash tables. The implementations and benchmarking were both done in Java with the graphs and analysis done through Pandas and Seaborn+Matplotlib in Python. The implementations themselves include a linear probing implementation, a quadratic probing one, a linked list based hash, and finally a Cuckoo hash. There are four tests I examine: the average time taken for a contains that finds its target, an add function, an evict function which removes from the hash, and finally a contains function which fails to find its target. However, because I didn’t implement the quadratic probing evict method, the avgEvict and avgContainsMiss data points are missing for quadratic probing (it was too complex for me at the time I wrote this code). Before we look at the actual analysis comparing the performance of various hash implementations, let’s understand what a hash is and what each of these implementations are.
The next section will feature a brief explanation of hash tables and collision handling, so feel free to skip to the Analysis section if you already understand these ideas.
Note: This blog post is written very much like a research paper, but don’t be fooled: the research done for this was not nearly vigorous enough to merit the format, I just felt like switching things up.
What Are Hash Tables?
Hash tables, also called hash maps, are data structures that map some keys to some values. They use a hash function to map one key of some type to one value of some type. A common implementation of a hash table would be an array to hold the values and a function which maps a key to an integer index of the array. A good hash table that has a good hash function and a good size has a near constant insertion, deletion, and query time, but this is not always possible. Typically the hash function has overlap in it, meaning two unique keys can often have the same hash value, leading to a collision. The number of collisions increases as the load factor, which is the number of entries occupying the table divided by the total size of the table, increases. An increasing number of collisions makes lookup time increase, often losing the desired constant time. The effects of this are mitigated through various methods of which we will examine four. Let’s see how those methods work and then we can look at their performance. (You can find more information in this great article).
Open Addressing
In open addressing, collisions are handled by placing the colliding value in the next available space some interval from its original space in the table.
For example, in this hash table diagram, you can see how collisions are handled with an interval of 1. The key John Smith attempts to be placed at index 001 but it collides with Lisa Smith. So John Smith is placed in the next available spot which happens to be 152 as positions 002 through 151 are occupied by other values. Now that John Smith has been placed in the hash table, Sandra Dee attempts to be placed but collides with John Smith’s new position. So Sandra is placed in the next available position, which happens to be 153. Then, Ted Baker tries to be placed in spot 153 but that is occupied so he is moved to position 154.
This specific example used an interval of 1, meaning that colliding values are placed in the immediate next empty position (1 space away), and is called Linear Probing. There is another method which is called Quadratic Probing, which increases the interval by some polynomial value every time there is a collision with a given key.
Separate Chaining
Another method of resolving collision is to use separate chaining with linked lists. This method works by filling the entire hash table with empty linked lists, and as keys are added, they are appended to the linked list corresponding to their hash value. This means keys with the same hash value are just appended to the same list.
As seen in this photo, collisions are easily dealt with, but it is important to note that as the number of collisions increases, the constant time to access the linked list is overpowered by the O(n) time to access elements in a linked list.
Cuckoo Hashing
Finally, there is a hashing technique named after the Cuckoo Bird which pushes other eggs out from its nest after hatching. This algorithm uses a number of hash tables (2 in my implementation), a primary, secondary, tertiary, etc., along with unique hash functions for each table. The primary table functions just like a normal hash table, but when there is a collision, the new key takes the position of the old one occupying its space. The old key is then pushed into the secondary table, where its new hash value is found according to the hash function of this table (hence the name). If it collides here, then the process is repeated with the old key now becoming the new one and the colliding key being pushed into the next table. If this continues to the last table, the final value is then wrapped around to the primary table. If there is a collision here too, then the process repeats. If a loop is detected, then all the tables are doubled in size, which forces a new hash value to attempt to resolve the loop. More detailed explanation here.
Analysis
Okay, so now you have a basic understanding of hash tables and a couple of ways to deal with collisions. Let’s analyze each of them for their performance.
Abstract
When using hashes, there are a number of different implementations that change how the hash deals with collisions, each with their own benefits and drawbacks. Hashes implementing linear probing, quadratic probing, a linked list hash, and a Cuckoo hash were all tested to determine each of their benefits. By analyzing these differences, the best implementation can be chosen for a particular task, allowing the most efficient solution to any given problem requiring the use of hashes.
Testing Methodology
The testing measured time taken for four tests: contains where a hit is guaranteed, eviction, contains where a miss is guaranteed, and addition of new elements. Each of these tests were performed on each implementation (with exceptions noted below) 10,000 times to acquire the most accurate data possible. The eviction and contains with misses were not done for quadratic probing and the addition for quadratic probing always attempted to add already present elements rather than adding new elements as all other implementations’ addition test does. This will make any conclusions drawn about quadratic probing difficult to justify due to the lack of data, with the contains hit as the only reliable data.
The load factor for the Cuckoo hash is manually calculated as a constant number of elements are added to it due its tendency to grow to resolve hashing conflicts. This doesn’t affect results but should be noted.
The testing data was split into two different datasets, dataset B and dataset D. Both datasets contain integers between 0 and 100,000, however their distributions are different. Dataset B has a relatively equal distribution across its range while dataset B has a symmetric bimodal normal distribution. This can be seen in the following graph which plots the distributions of the datasets, with dataset B in blue and dataset D in green. The effect of these varying distributions will be discussed in the results section.
Expected Results
One can expect certain performance degradations and improvements as load factor varies depending on the implementation and the method being tested.
For linear probing, as the load factor increases, the performance of the hash tends towards O(n) due to the sequential nature of the clusters that are formed. This will cause performance to degrade for the add, evict, and contains miss tests, though not necessarily for the contains hit test. With the contained hit test, the likelihood of the hash finding a hit close to the original hash position is high as a hit is guaranteed, unlike on the contains miss, even as the load factor increases, meaning performance won’t be as heavily affected by high load in this test.
For quadratic probing, the time taken for contains hit should not be too heavily affected by increased load factor as quadratic probing breaks up clusters, keeping performance from tending to O(n). It’s add performance should be similar and there isn’t much to say for evict or contains miss as they aren’t tested.
For the linked list implementation, one can expect that as load factor increases, contains will slow down, but add should stay fast. Evict would also slow down, however the decreases in speed in contains and evict would hardly be noticeable compared to slow downs in linear or quadratic probing due to the heavy optimization of linked lists in Java.
Finally, the Cuckoo implementation should have close to constant times for contains and evict. Add should be the only area where Cuckoo may slow down because as load factor increases and the hash is forced to grow (due to natural loops), the copy of data from one array to the other will take more time due to larger arrays.
Results
Firstly, the implementation’s effect on performance will be considered before the effect of the datasets will be. The following graphs examine the various average timings, in microseconds, taken for each test by implementation and by dataset.
Average Contains Hit
In this first graph, we see the average time taken to make a positive contains call on the hash. Linear probing, abbreviated as LP, has poor performance as load factor increases in both datasets, though it does have mediocre performance at lower load factors. This is as expected due to the clusters that form with LP causing close to O(n) complexity.
Quadratic probing, or QP for short, has the worst performance as load factor grows, almost forming a quadratic curve ironically enough. Its performance at low load factors is second to best, but it rapidly increases which is expected. However, it is surprising that QP performs worse quicker than LP given the expectation that QP breaks up clusters. The likely explanation is that QP breaks up clusters at low load factors, resulting in lower than O(n) performance, which beats LP, but as load factor increases past 0.5 QP requires more checks than linear probing as there are fewer, larger clusters, meaning a linear search is more efficient at that point.
The linked list implementation, LL, has a nearly linear correlation between load factor and average contains time. This is just as expected with true O(n) performance, though it is somewhat surprising that Java’s native optimizations do not beat out LP at any load factor above ~0.23.
Finally, the Cuckoo implementation, abbreviated to C, which has the best performance in all scenarios. This implementation keeps a near constant time to check contains due, as expected, due to its array like nature. It is odd, though, that the average time taken is longer for load factors lower than 0.3 before settling at ~55 μs.
It is clear that the best implementation to choose for any load factor is Cuckoo if the time taken to check a positive contains is important. It is more than two times faster than other implementations at certain load factors and isn’t very complex to implement.
As for variation by dataset, there seems to be very little. However, it appears that QP performed better on dataset D at all load factors but 0.1. This can likely be explained by dataset D’s natural clustering due to its distribution which makes the QP implementation more effective. LP had similar performance on both datasets except for its considerably higher time taken on dataset B with a load factor of 0.9. Both LL and C had nearly identical performance on both datasets with differences being chalked up to measuring error.
As seen in this distribution plot, the C implementation is most consistently the lowest while there is as high variation with QP and mediocre consistently for both LL and LP. Therefore, the Cuckoo hash is the best hash for contains hit, with consistently lower times.
Average Add
LP has respectable performance between load factors of 0.1 and 0.7, though after that its performance becomes terrible. This large spike is unexpected, though one of smaller magnitude is not entirely unreasonable. There is a stark difference between datasets here, with dataset B performing almost three times slower than dataset D. This may be because the more closely grouped numbers in D will have distinct keys while the larger spread in B will cause more overlap due to the wraparound nature of the hash function.
The performance of QP is among the best at load factors less than 0.3 but after that it is average at best. It should be noted that this specific test for QP never inserts an element because it is already present, though, which could cause this mediocre performance.
The LL performance is close to the worst, though it is predictable unlike LP. Again, an expected outcome due to the necessity of adding elements to a linked list.
The Cuckoo hash once more has incredible performance at higher load factors, but it does fall behind at a 0.1 load factor. This performance penalty is only incurred in dataset B, which may be because of the extra ‘loops’ formed in that dataset which need to be broken by growing the hash, a costly process. Then, this becomes a non issue at higher load factors due to the likelihood of collisions that comes with a net larger hash table.
The best implementation for workloads that mainly add would be Cuckoo for all cases unless the hash is never growing past a 0.2 load factor. It is incredibly consistent and if a workload is adding, it will realistically need to query, which Cuckoo also excels at.
Finally, the distribution plot shows that Cuckoo hash is just as consistent as the LL with QP’s consistently not far behind. With LP’s terrible consistency and performance, C is the clear winner of this workload.
Average Evict
The performance of LP is clearly the worst out of the three implementations tested here. It grows to what appears to be an exponential curve which is not surprising due to the complexities of evicting items at high load factors.
For LL, the performance is near constant, which is somewhat as the time taken to remove an element from a linked list would seemingly take longer as the list is longer. However, it appears Java’s optimizations may help LL out in this case.
Finally, C has near constant performance again, with slightly higher times taken at a load factor of 0.1. This is because of the near instantaneous responses of arrays.
There is little variation between datasets with dataset D having better performance for LP. Overall, the best performance is Cuckoo, though LL is not far behind.
In the distribution plot for evict, the C hash is so consistent that we need to zoom in to even see the competition. The C hash is so consistent because it merely changes the value of a few array elements, and that’s it. The LL is not terrible either as it is well optimized and only needs to move some pointers. However, LP is very inconsistent due to its complex eviction process. Once more, Cuckoo hash wins for its consistency and great performance.
Average Contains Miss
The LP implementation lets down once more in this test. It has terrible performance as load factor increases past 0.7 and still has the worst performance at all other load factors. This is just as expected, with the contains miss test requiring an almost full search of the hash. There is a notably higher time taken in dataset B, which can be explained by the same reasons it was in past LP tests.
LL has the best performance at load factors less than ~0.45 and after which still has respectable performance. Most likely due to Java optimizations, checking a contains on a small linked list is faster than two checks on two separate arrays. However, this breaks down as the list gets longer. The difference between datasets is small enough it can be ignored due to margin of error.
The C hash has great performance overall as expected. The best choice for this workload would be C if the load factor is greater than 0.45 and the LL implementation if the load factor is smaller.
Again, we need to zoom in because C is very consistent for reasons discussed earlier. This distribution plot paints the same picture as the one for the average evict times: Cuckoo is very consistent, LL is not far behind, and LP is all over the place. Cuckoo is the winner here too, though it was a much closer fight.
Conclusions
The Cuckoo hash had the best performance and consistency across many different workloads, datasets, and load factors. Regardless of the distribution of your dataset, Cuckoo performs at near constant times which are only rivaled occasionally. And though they are, Cuckoo beats the competition by too wide of a margin in enough tests that it doesn’t matter. If a certain workload is all that is needed, then refer to that section for the best implementation, though Cuckoo is the best overall. Its implementation is not overly complex too, unlike QP.
Future Work
It was quite interesting that the Cuckoo hash dominated all other implementations in almost every aspect. Its concept is quite simple yet very clever which gives the bonus of easy implementation, unlike QP. Both QP and LP were much too complex for their mediocre performance, though it was surprising that LP performed as poorly as it did. The concept of LP was quite simple (though the implementation of the evict function wasn’t). It goes to show that complexity isn’t a guarantee for a good performance, but neither is simplicity. It would be very interesting to explore more about the LL implementation as it had some performance gains that were quite impressive and a deeper dive into Java’s optimization of the Linked Lists may provide some answers. It would also be interesting to explore which implementations Java and other languages use and if their implementations vary depending on the given conditions.

