What is Cuckoo Hashing?

Humberto Villalta
5 min readApr 19, 2023

--

1. Overview

Cuckoo hashing is used as a solution for hash collisions, and its worst-case lookup time is constant O(1). The reason why is it called “cuckoo” is because the cuckoo bird lays the eggs in another nest. When the chicks hatched, they push the younger eggs out of the nest. Like the given example, cuckoo hashing has a similar structure for preventing collisions along a table. Cuckoo hashing has a total of two hash functions and two hash tables to index the keys. The value is first indexed in the first hash table. However, if the next value collision with the position of the just inserted value, the old value will be indexed in the second hash table and the new value at the first hash table, and so on.

2. Cuckoo Hash Indexing Structure

Cuckoo Hashing is composed of:

Two Hash Functions:

  • The first hash function is going to index the key in the first hash table.
  • The second hash function is used as collateral to index the key in case another key collides with the new coming key.
  • These hash functions work together. By this, I mean that indexing is going to be recursive because values are going to be pushing each other until all values belong to a cell or “bucket”

Two Hash Tables:

  • These two tables will be used to index the keys dynamically when collisions occur in the first or second table to solve collisions as shown below.

3. Cuckoo Insertion Deep Sight

Let’s say we have the next input data:

{"Apples", "Bananas", "Oranges", "Grapes", "Avocados", 
"Tomatoes", "Melons", "Tangerines", "Raisins", "Watermelons"}

# Total of 10 values

We will have to hash every word of the set and get the next results table:

If we go step by step, we first insert “Apples” into Table 1.

“Bananas” are also inserted in Table 1 because the slot is empty.

“Oranges” and “Apples” have the same hash index number. Therefore, “Oranges” as the new value is inserted in Table 1 index 9, and “Apples” are relocated to Table 2 index 1.

“Grapes” and “Oranges” have the same hash index number. Therefore, “Grapes” as the new value is inserted in Table 1 Index 9, and “Oranges” are relocated to Table 2 index 4.

“Avocados” are inserted in Table 1 index 1.

“Tomatoes” are inserted in Table 1 index 1 but “Avocados” are relocated to Table 2 index 9

In this step, there are a lot of relocations going on. “Melons” are inserted first in Table 1 index 6. However, “Bananas” were located in that position. Therefore, we send “Bananas” to Table 2 index 4 but “Oranges” are located in that position of the table. This is chained into another relocation because “Oranges” now have to be sent to Table 1 index 9 where “Grapes” are located. Now, “Grapes” have to be sent to Table 2 index 6. Fortunately, there is no Key located in that slot so the process finishes here.

“Tangerines” is inserted in Table 3 index 3 where the slot is empty.

“Raisins” is inserted in Table 1 index 3 and “Tangerines” are relocated in Table 2 index 0.

Finally, “Watermelons” are inserted in Table 1 index 6. However, as we can see in the table below, “Watermelons” is also pushed away from its original position because it collided with the “Bananas” after a long relocation procedure.

3.1 Final Results

We get as a result the next Cuckoo Table according to its hash function values:

4. Conclusion

Cuckoo hashing indeed has an outstanding method to prevent collisions. However, the more loaded the table gets, the more high insertion latency will suffer. Cuckoo indexing can cause endless loops when inserting data because it keeps reallocating the values along the two tables as shown in the above example. This is the main reason why cuckoo hashing has a degraded performance for inserting data in some situations.

Understanding Cuckoo hashing is important because as the hash index, Cuckoo hashing is applied also to probabilistic data structures such as Cuckoo Filters. This probabilistic data structure provides a way to support deletions thanks to the mentioned structure above but also provides equal or less space complexity than the famous bloom filters supported by hash indexing.

References

  1. https://www.geeksforgeeks.org/cuckoo-hashing/
  2. https://en.wikipedia.org/wiki/Cuckoo_hashing
  3. https://programming.guide/cuckoo-hashing.html

--

--

Humberto Villalta

I am a BigData Engineer who has a lot of interest in the research and development of data structures for the optimization of NoSQL Databases.