Understanding HashMap Data Structure(Ruby)

Yair Fernando
Software Development with Ease
7 min readDec 12, 2020

In this article I will be talking about another data structure HashMap, so let’s start by understanding what a data structure is.

A Data structure is a collection of data that is organized in a way that gives you the ideal run time complexity for some operations, a data structure handles the data in a way that might be beneficial for specific cases and operations.

So we use data structures because it gives us a better performance under certain conditions, it is worth mentioning that a data structure does not cover all possible operations, usually, a data structure is really good at doing one thing but also has some drawbacks and you have to be aware of those, either it requires a lot of configuration, adds more complexity, some operations will be slower, more space in memory, etc.

Now we will be implementing a HashMap using a LinkedList, so if you haven’t checked out this other article about LinkedList, I encourage you to first read that article and come back to this one.

What is HashMap?

It is a collection of key-value pairs that it is used to store relationships between data, and it gives you constant time complexity on search operations. It is used to solve so many different types of problems and algorithms.

You might be thinking why not use an array or linked list instead of a HashMap, well it turns out that when using a HashMap to store values you get Big O(1) when adding and getting elements because you don’t have to traverse the elements until you get the element you are looking for as you would have to if you are using an array or linked list. With a linked list you would have to traverse until you find the right element which gives you Big O(N).

HashMaps are a good choice if you are not going to iterate the data that it is going to be stored inside the HashMap.

So far we know the purpose of a HashMap which is to store values by providing a key, then you can retrieve those values using the same key.

Now let’s talk about hash functions.

Hash Function

We can also implement a hash function which will take that key and map it to an index of a fix size array that the HashMap defines internally. The value that the hash function returns is called has key or has code and that value is used to determine the index in which to store the data inside the array.

Things to consider about a hash function are that it must be as fast as possible, it must return the minimum number of collisions and it has to handle all possible cases for the particular key.

But what is a collision? Well sometimes your hash function will return the same index for a different key, then we have a collision and we can handle collisions by using a really common technique called chaining which means that we use a LinkedList for each index in the array so that each value in the LinkedList points to the next one in case we get a collision.

Now that we cover the technical stuff, let’s start implementing a HashMap.

Since we will use a LinkedList to store data, let’s go over the implementation of that one first.

Node: This class will be the building blocks of the LinkedList.

class Node
attr_accessor :key, :value, :next, :prev
def initialize(key, value, following=nil, prev=nil)
@key = key
@value = value
@next = following
@prev = prev
end
end

The Node class takes four parameters, the key and the value that we want to store inside the HashMap, the following pointer and the prev pointer, these two will point to the next and previous node in the list, and in case there is no previous or next node they will point to nil.

Now let’s go over the LinkedList implementation.

class LinkedList
include Enumerable # Include the enumerable class to implement an each method for the list.
attr_accessor :head # Represents the start of the list.
# Initialize methods, defines the @head of the list to nil
def initialize
@head = nil
end
# Return whether or not the list is empty
def is_empty?
@head.nil?
end
# Return whether or not a specific key is present inside the list
def include?(key)
return false if is_empty?
return true if @head.key == key current = @head
until current.next.nil?
current = current.next
return true if current.key == key
end
false
end
# Insert a new node into the list by passing a key and a value
#
# @example
# key = "john"
# value = {"name" => "john", "age" => 33}
# list = LinkedList.new
# list.insert(key, value)
def insert(key, val)
return @head = Node.new(key, val) if is_empty?

current = last_node
current.next = Node.new(key, val, nil, current)
end
# Insert a new node at the begining of the list by passing a key and a value
def insert_first(key, val)
new_node = Node.new(key, val, @head)
new_node.next.prev = new_node
@head = new_node
end
# Delete a node from the list by passing the key
def delete(key)
return if is_empty? || !include?(key)
remove_head if @head.key == key last_node do |current|
delete_key(current) if current.key == key
end
end
# Delete the first node from the list
def delete_first
remove_head
end
# Delete the last node from the list
def delete_last
last = last_node
delete_key(last)
end
# Return the value of a node by giving the key
def get(key)
return nil if is_empty? || !include?(key)
return @head.val if @head.key == key
current = @head until current.next.nil?
current = current.next
return current.val if current.key == key
end
end
# Return the value of the last node
def last
last_node.val
end
# Return the value of the first node
def first
is_empty? ? nil : @head.val
end
# Return the size of the list
def size
return 0 if is_empty?
current = @head
count = 0
until current.next.nil?
current = current.next
count += 1
end
count
end
# Each method for a LinkedList
def each
return if is_empty?
yield(head) if defined?(yield) current = @head
until current.next.nil?
current = current.next
yield(current) if defined?(yield)
end
end
private # Removes the head of the list
def remove_head(key)
val = @head.val
@head = @head.next
@head.prev = nil if @head
val
end
# Returns the last node in the list and executes a block if given
def last_node
return if is_empty?
current = @head until current.next.nil?
current = current.next
yield(current) if defined?(yield)
end
current
end
# Deletes a specific node from the list
def delete_key(current)
prev = current.prev
prev.next = current.next
current.next&.prev = prev
current.prev, current.next = nil, nil
end
end

That is the entire implementation of the LinkedList, now we can go ahead and start building the HashMap class.

class HashMap
attr_accessor :capacity, :buckets, :size, :collitions
include Enumerable # Include the enumerable class to implement each method.
# Initialize the HashMap with a fix capacity, fill in the buckets and start with size and collitions being cero.
def initialize
@capacity = 5
@buckets = Array.new(@capacity) { LinkedList.new }
@size, @collitions = 0, 0
end
# Return whether or not the HashMap is empty
def is_empty?
@size == 0
end
# Insert a new element into the HashMap, if the size and capacity are the same it first double the capacity and try to insert again.
def insert(key, val)
if @size == @capacity
rehash
insert(key, val)
else
index = get_bucket_index(key)
@collitions += 1 if !@buckets[index].is_empty?
@buckets[index].insert(key, val)
@size += 1
end
end
# Return the value of a specific key
def get(key)
return nil if is_empty?
@buckets[get_bucket_index(key)].get(key)
end
# Remove an element form the HashMap by giving the key
def remove(key)
return nil if is_empty?
@buckets[get_bucket_index(key)].delete(key)
@collitions -= 1 if @buckets[get_bucket_index(key)].size <= 1
@size -= 1
end
# Return whether or not the HashMap include a value by providing a key
def include?(key)
return false if is_empty?
@buckets[get_bucket_index(key)].include?(key)
end
# Each method for a HashMap
def each
return if is_empty?
@buckets.each do |bucket|
current = bucket.head
if current
bucket.each do |node|
yield(node) if defined?(yield)
end
end
end
end
private # Return the index for a specific key
def get_bucket_index(key)
hash(key) % capacity
end
# Calculate the has_code for a given key
def hash(key)
key = "#{key}#{key.class}"
hash_code = 0

for i in (0..key.size-1)
hash_code += key[i].ord
end
hash_code += ((key.bytes.inject(:+) + key.bytesize))
end
# Recalculate the capacity of the HashMap and reinsert the current elements into their corresponding new index inside the array
def rehash
@capacity *= 2
@collitions = 0
rehash_buckets = Array.new(@collitions) { LinkedList.new }
rehash_insert = -> (index, current) { rehash_buckets[index].insert(current.key, current.val)
@buckets.each do |bucket|
current = bucket.head
if current
index = get_bucket_index(current.key)
collitions += 1 if !rehash_buckets[get_bucket_index(current.key)].is_empty?
rehash_insert.call(get_bucket_index(current.key), current)
until current.next.nil?
current = current.next
collitions += 1 if !rehash_buckets[get_bucket_index(current.key)].is_empty?
rehash_insert.call(get_bucket_index(current.key), current)
end
end
end
@buckets = rehash_buckets
end
end

Great, now we are done building our HashMap implementation, we can now test it.

h = HashMap.new
h.insert("carl", {name: "carl", id: 1, age: 24})
h.insert("john", {name: "john", id: 2, age: 32})
h.insert("patrick", {name: "patrick", id: 3, age: 19})
h.insert("michelle", {name: "michelle", id: 4, age: 22})
h.insert("miranda", {name: "miranda", id: 5, age: 29})
h.insert("laura", {name: "laura", id: 6, age: 21})
h.insert("angel", {name: "angel", id: 7, age: 44})
h.insert("aly", {name: "aly", id: 8, age: 29})

We can see the elements inside by using the each or get method.

h.each do |bucket|
p "key: #{bucket.key}, value: #{bucket.val}"
end
HashMap Elements.

With this implementation we have access to see how many collitions we have, the size of the HashMap, the capacity and the actual buckets which are an array of LinkedList that store the key-value pairs of data.

Conclusion

HashMaps are great when we want to have constant run time complexity BigO(1) in insert, delete and get operations, we can also extend this HashMap and create more useful methods or improve the hash function. It all depends on your needs and how fast and robust you want your HashMap to be and perform.

If you have any suggestions or thoughts please leave a comment below, thank you for reading :).

--

--