Internal working of HashSet (how it ensures not to store duplicate objects)

While going through the collections framework in java recently, a question came into my mind; what if I don’t want to store the duplicate values in the collection but I want the collection API to treat two objects duplicate if one of their properties are same.

So lets say I have below model class

Person model class

and I am willing to store the objects of this class into a HashSet, but the condition is if two persons are having same name the HashSet API should treat them as one and only one should be stored in the collection(HashSet doesn’t store the duplicate values).

To do this we will have to first identify how the values are being stored, and in which data structure, when we call .add method on the Set type. Here is the code that is being called when we call .add method in the Set type

definition of the add method in HashSet class

We can see here the object that we are passing to add into the HashSet is being put into the map variable, and this map has below declaration in the same HashSet class

put method on this map is being called when we call add on HashSet

So whatever the value we are passing to add into the HashSet is eventually being added into a HashMap as a key, now the important point here is, we know that HashMap will not allow the duplicate keys to be added into it, implementation of HashSet results into not allowing the duplicate values to be added into it.

If we want to change how HashSet adds value into it, we will have to look how the keys are being stored into a HashMap to know this lets see the definition of the .put method in HashMap class.

definition of the put method in HashMap class

By seeing the above definition of the put method we will try to figure out how HashMap ensures that no duplicate keys will be stored in it.

If we see the underlined code in above image we can see the condition checks, if the object’s (which is passed to map.put or set.add) hash (derived from hashcode) is equal to already stored element in the HashMap and it is equal to already stored element or equals method if called on two returns true, the object will not be stored into the HashSet or the key will not be stored into the HashMap.

Now that we know equals and hashCode methods play very important role deciding whether the passed object is duplicate or not and we have not defined any such method in our Model class, super class’ (Object class’)equals and hashCode will be called at this point. But if we want to change the case we will have to define (override) our own equals and hashCode methods.

Let’s say we don’t want to store two Person objects if their names are same we should change equals method in such a way that if name of two Person’s objects are same this method should return true.

checking just the equality of names of the Persons

this implementation of the equals is doing exactly what we are expecting, returning true if the name of the object’s are same.

Now we have to override the hashCode method also (we should always override hashCode if overriding equals), so that the condition that is written in the definition of the put method will be true for the objects having the same name.

this is not a good implementation of the hashCode though but we did this just to return the same hashCode for the objects having same name

and after overriding these two methods in the Person class if we try to add two Person objects with the same name the object will not be added and below code

will result into

We can clearly see that the other object with name “Mayank” is not added into the Set, even though the mayank and saurabh are entirely different references.