Equality Of Objects in Java-Part 3

This piece adds up to my previous post where I tried and explain about what hash codes are, how they are useful and where should we be better off using them. This post will try to address the question, “What is a good hash code for my object”.

reference : https://blog.varonis.com/wp-content/uploads/2012/08/eagle.jpg

What defines a good hash function?

A good hash function tends to produce unequal hash codes for unequal objects. Ideally, a hash function should distribute any reasonable collection of unequal instances uniformly across all possible hash values.

We must exclude any fields that are not used in equals comparisons, or we risk losing the consistency of our hashCode. Java hashCode is an integer.

A good hash function(Effective Java 8th Ed.)

  1. Create a int result and assign a non-zero value.
  2. For every field f tested in the equals() method, calculate a hash code c by:
  • If the field f is a boolean: calculate (f ? 0 : 1);
  • If the field f is a byte, char, short or int: calculate (int)f;
  • If the field f is a long: calculate (int)(f ^ (f >>> 32));
  • If the field f is a float: calculate Float.floatToIntBits(f);
  • If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
  • If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
  • If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.

Combine the hash value c with result:

result = 31 * result + c

Let us decode the algorithm a little bit.

Why hashCode is an integer and not a long?
You must be wondering why? The larger the number, the lesser are the chances of collisions. True, but since the prime use of hashCode() is to determine which slot to insert an object into in the backing array of a HashMap/HashTable, a hashCode > Integer.MAX_VALUE would not be able to be stored in the array.

Why chose 31?
31 is a prime. A hashTable works by taking the modulus of the hash over the number of buckets. It’s important in a hashTable not to produce collisions for likely cases, since collisions reduce the efficiency of the hashTable. if the constant used in the hash, and the number of buckets, are co-prime, then collisions are minimised in some common cases. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i.

What is it going on with long?
>>> is the shift and XOR operator.
The top 32 bits of the long will be zero and the bottom 32 bits of long will now be the top 32 bits. Xor’ing will effectively leave top 32 bits alone but that doesn’t matter because of the next step. This effectively help us find a good mix of the current and previous state.

So, here I wrap up my series of object equality in Java. Feel free to comment over any issues you come across or there are indiscrepancies in the content.

{ Happy Coding }