--

# Introduction

We are going to review the subtleties and complications of trying to compare objects for equality in Java, where the problem originates, why it is important, Kotlin’s approach on the problem and some recommendations on the topic.

Determining if two entities are the same is a fundamental operation in mathematics and we implement this operation in programming by the weaker notion of equivalency; the difference being that we are content with equality across a specific subset of properties of the entities in question, in order to consider them as being equal.

As a reminder, an equivalence relation ~ on T has the following properties:

i) a ~ a for all a in T (reflexive)

ii) if a ~ b then b ~ a (symmetric)

iii) if a ~ b and b ~ c then a ~ c (transitive)

Although discussing the topic in this fashion might appear pedantic, it has very subtle and important implications. As developers we are interested primarily in the correctness of our code and that is something that can only be achieved by careful code analysis and reasoning around invariants.

An example to drive the idea home is that if we have a sorted array as input, we can improve searching from O(N) to O(logN) by ignoring halves of the array safely relying on the transitivity property.

In general we can achieve optimisations in our code by using logical reasoning but only if certain laws hold. We try to make sure these laws remain valid by creating certain contracts which, if not kept, then the behaviour we may observe is unspecified.

For the rest of the text the terms equal and equivalent will be used interchangeably and hashCode is assumed as part of the contract but is omitted for clarity.

# Java’s approach to equality

Java’s original design decision was to depend on a method equals for determining object equality deviating from the operator overloading approach that C++ has. This was built into the language and every object, by default, inherits an equals method which compares the object’s references. The method can then be overridden to be able to compare different object instances. Once equals is overridden then hashCode must be overridden too.

At this point, we already encounter a difference in terms of performance. While the default performance when trying to compare two objects is O(1), since it is merely a reference comparison, once the equals has been overridden we can’t really know the impact without inspection of the code.

Take for instance the common use case of String being used in java collections such as Map or List. These collections expect that the objects stored provide a meaningful implementation of equals, which String does provide, and the current implementation is to compare all characters of a String for equality. This means that we now end up with worst case runtime performance for equals O(N) e.g. for cases that the compared strings have the same length and content or have the same length and are different but we have to scan the string completely to find a difference.

And although certain optimisation techniques have been used for Strings, in general our implementation of equals could have an unexpected impact on performance if we are not careful.

In any case, the core problem is that overriding the equals properly is not as trivial as it appears, because it has to comply with the equivalence relation requirements mentioned earlier and it is the symmetry and transitivity properties that require more attention.

When the symmetry is broken this essentially means that two objects do not mutually agree if they are equal. An example of how we can end up with this issue is if we try to apply the notion of equality between implementation classes that don’t strictly know each other.

The following Pin class will serve as a base class for the examples that follow. A small note here: we usually compare double/float using a precision level that makes most sense, and one could be confused on their appearance as part of a topic on equals. In this case the example is based on com.google.android.gms.maps.model.LatLng which compares the numbers based on the IEEE 754 bit representation and in any case it is only tangential to the point of the example.

`public class Pin {   private final LatLng coordinates;   private final String id;public Pin(double lat, double lon, String id) {       Objects.requireNonNull(id);       this.coordinates = new LatLng(lat, lon);       this.id = id;    }@Override    public boolean equals(Object o) {       if (this == o) return true;       if(o instanceof Pin) {         return coordinates.equals(((Pin)o).coordinates)                   && this.id.equals(((Pin)o).id);       }       if(o instanceof LatLng)          return this.coordinates.equals((LatLng) o);return false;}// hashCode omitted for clarity`

With this approach we can have:

`LatLng point = new LatLng(51.471547, -0.460052);Pin pin = new Pin(51.471547, -0.460052, "12ax12345");pin.equals(point) // truepoint.equals(pin) // false`

Usually the IDE should provide a warning that we are comparing between inconvertible types (we should pay attention to such warnings) and, unless we try to be “too smart” with interoperability among unrelated hierarchies, this is not a problem we can accidentally stumble upon.

The real problem is achieving transitivity and the crux of that problem starts when we are extending a non abstract class and adding fields that should be considered significant for equals/hashCode.

Example:

`public class PointOfInterest extends Pin{  private final String name;  private final String description;    public PointOfInterest(double lat,             double lon,             String id,             String name,             String description) {      super(lat, lon, id);      Objects.requireNonNull(name, description);      this.name = name;      this.description = description;  }@Override  public boolean equals(Object o) {     if (this == o) return true;     if (!(o instanceof PointOfInterest)) return false;     PointOfInterest that = (PointOfInterest) o;     return super.equals(that)         && name.equals(that.name)        && description.equals(that.description);   }// hashCode omitted for clarity}`

We see here that PointOfInterest extends the class Pin, adds some fields, and overrides the equals method to take them into account. Although this approach appears reasonable, nevertheless the symmetry is broken across equality comparisons. I.e.

`Pin pin = new Pin(51.471547, -0.460052, "1234");PointOfInterest poi = new PointOfInterest(51.471547, -0.460052, "1234", "London", "Heathrow Airport");pin.equals(poi); // truepoi.equals(pin); // false`

This seems quite unexpected since both objects actually refer to the same location (based on the base class’s notion of location).

One approach to tackle this might be the following:

`@Overridepublic boolean equals(Object o) {   if (this == o) return true;   if (!(o instanceof Pin)) return false;   if(!(o instanceof PointOfInterest)) return o.equals(this);PointOfInterest that = (PointOfInterest) o;   return super.equals(that)           && name.equals(that.name)           && description.equals(that.description);}`

This way we preserve symmetry but we have introduced 2 new problems.

The first problem is that we break transitivity. Example:

`Pin pin = new Pin(51.471547, -0.460052, "1234");PointOfInterest poi1 = new PointOfInterest(51.471547, -0.460052, "1234", "London", "London Airport");PointOfInterest poi2 = new PointOfInterest(51.471547, -0.460052, "1234", "London", "Heathrow Airport");poi1.equals(pin); //truepin.equals(poi1); //truepin.equals(poi2); // truepoi1.equals(poi2); // false`

The second problem is that if we extend the Pin hierarchy with another derived class and follow the example of equals for PointOfInterest the following code ends up with infinite recursion:

`public class TransportationMarker extends Pin {  private final String name;  private final String type;public TransportationMarker(double lat,        double lon,String id, String name, String type) {       super(lat, lon, id);      Objects.requireNonNull(name, type);      this.name = name;      this.type = type;   }@Override   public boolean equals(Object o) {     if (this == o) return true;     if (!(o instanceof Pin)) return false;     if(!(o instanceof TransportationMarker)) return o.equals(this);     if (!super.equals(o)) return false;     TransportationMarker that = (TransportationMarker) o;     return name.equals(that.name) && type.equals(that.type);   }   // hashcode omitted for clarity }PointOfInterest poi1 = new PointOfInterest(51.471547, -0.460052, "1234", "London", "London Airport");Pin poi2 = new TransportationMarker(51.471547, -0.460052, "1234", "London", "Airport");if(poi1.equals(poi2) {// infinite recursion here}`

The culprit being these lines in the equals implementation of the derived classes:

`if(!(o instanceof PointOfInterest)) return o.equals(this);if(!(o instanceof TransportationMarker)) return o.equals(this);`

To solve this we would need to adapt the implementation each time we add a new derived class or incorporate a check for the actual implementation class. And indeed another approach to tackle the issue is to replace instanceof with getClass which does exactly that.

`@Overridepublic boolean equals(Object o) {  if (this == o) return true;  if(o == null || o.getClass() != getClass()) return false;  PointOfInterest that = (PointOfInterest) o;  return super.equals(that)       && name.equals(that.name)       && description.equals(that.description);}`

This bases the comparison on the implementation class and as a result:

`poi1.equals(pin); //falsepin.equals(poi1); //falsepin.equals(poi2); // falsepoi1.equals(poi2); // false`

Using this approach we don’t have the broken transitivity issue, but whether one is happy with pin not equal poi1 is a matter of perspective.

Nevertheless we now have a new problem which is that we no longer can substitute a base class with a derived class, for instance in a method call, even if the derived class does not have a different notion of equality than the base class.

The lesson here is that there is no way to extend a non abstract class adding a field that matters when differentiating between instances and still preserving the equals contract.

By the way, this also shows another weakness we have when using inheritance and is another instance where favouring composition over inheritance results in avoiding all of this complexity.

For example instead of extending Pin we could just as well have defined the PointOfInterest class as follows:

`public class PointOfInterest {  private final Pin coordinates;  private final String name;  private final String description;public PointOfInterest(double lat,       double lon,String id, String name,String description) {    Objects.requireNonNull(name, description);    this.coordinates = new Pin(lat, lon, id);    this.name = name;    this.description = description;  }@Override  public boolean equals(Object o) {    if (this == o) return true;    if (!(o instanceof PointOfInterest)) return false;    PointOfInterest that = (PointOfInterest) o;    return coordinates.equals(that.coordinates)       && name.equals(that.name)       && description.equals(that.description);}// hashCode omitted for clarity}`

This way we avoid all the surprises and issues arising from how we should handle comparison along the hierarchy.

A small final note here: a correct equals implementation should never accept that an object is equal to null or return different results when called multiple times.

# Kotlin approach

First of all Kotlin opted for a more integrated approach to equality. A common bug in Java is to compare objects for equality using the ‘==’ operator. For objects this operator compares if they are the same object and not if they contain the same value. Kotlin instead uses the ‘==’ operator to compare the value of objects by calling the equals method. For identity comparison like the Java ‘==’ operator, Kotlin introduces the ‘===’ operator.

In Kotlin taking into account exactly the problems mentioned so far, the approach followed is to simply prohibit the use of inheritance with data classes.

You can derive a data class from an interface or an abstract class.

`abstract class Pin {  abstract val id: String  abstract val lat: Double  abstract val lon: Double}data class PointOfInterest(  override val id: String,  override val lat: Double,  override val lon: Double,  val name: String,  val description: String): Pin()`

This is the only “approved” way of using inheritance with data classes.

Although we can extend a non abstract class it might not work as one would expect

`open class Pin (  open val id: String,  open val lat: Double,  open val lon: Double)data class PointOfInterest(  override val id: String,  override val lat: Double,  override val lon: Double,  val name: String,   val description: String): Pin(id, lat, lon)val pin = Pin("123", 51.471547, -0.460052)  val poi = PointOfInterest("123", 51.471547, -0.460052, "London", "London Airport")println(if(pin == poi) "Equal" else "Not equal")`

This prints “Not equal”. The base class uses the Object’s inherited equals implementation and checks for reference equality. We would have to override the equals implementation in order to change that and essentially go back to the problems we mentioned earlier with Java.

Essentially a Kotlin data class is the exact equivalent of a POJO or value class. The equals is automatically defined for a data class and explicitly uses instanceof; only the properties of the primary constructor are used for the comparison.

# Can we go wrong with Kotlin?

## Mutable keys

`data class Pin (    val id: String,    var lat: Double,    var lon: Double)val pin = Pin("123", 51.471547, -0.460052)val collection = hashSetOf(pin)// at some point later in the code pathpin.lat += moveByOffset()println(if(collection.contains(pin)) "pin in collection" else "pin not in collection")collection.forEach{  if(it == pin) {    println("pin found in collection by foreach")  }}`

Even with Kotlin, we still need to be aware of how we should handle the properties that are part of the equals/hashCode comparison. In this example the check for containment fails because we modified the field after it was added in the collection and hence the contain method fails to find it even though the item is still in the collection as we see with the forEach.

## Arrays

Another gotcha on this topic with Kotlin is the expectation we might have when using arrays. Arrays in Java are built-in objects which means that the equals is a referential check and not a data equality check. That is how they were designed and in Java we would use e.g. Arrays.equals to compare the content. This does not change in Kotlin and if we use an array as a component of a data class, again referential equality will be the basis of equals and not the actual contents. So you should at least consider very carefully if you have to use an array as a component of a data class since:

i) you would need to override the equals for a more meaningful check (the IDE should provide a warning such as “Array property in data class“ and we should pay attention to such warnings), and

ii) arrays are inherently mutable, so they are prone to errors such as the one mentioned above (changing the hashCode after an object has been stored to a collection).

# Conclusion and recommendations on the topic

As mentioned earlier, the original design decision in Java was to add the equals method in the Object class. If this was a good idea is debatable since many classes do not have a need for an equality test. In fact if we think of it carefully we realise that what we are actually interested in comparing is data and not objects. This is exactly the case for value classes, i.e. classes that represent a value.

On top of that, the notion of object equality can vary depending on the context and use case, e.g. different business requirements for different areas of our applications, but data equality is understood as invariable regardless of context.

Some best practices to consider are the following (ordered in terms of preference):

1. Consider carefully if you really need to override equals/hashCode. Unless it is strictly required, we usually can get away without defining one just fine and might be best if we avoid it. For instance, for usage in Collections, use a specific subset of preferred properties as the key and choose the proper equality method. The benefit of this is that we can change the comparison based on the usage context.
Examples:
`hashMap.put(hotel.getId(), hotel);treeSet(Comparator.comparingInt(Hotel::getHotelId))treeSet(Comparator.comparingString(Hotel::getHotelName))`

For usages in containers such as List in Kotlin you can even use the proper extension function e.g. find instead of relying on equals

2. Always use Kotlin data classes for comparable entities. Kotlin enforces many excellent practices that we have to be careful to avoid in Java.

3. If you must override equals/hashCode adhere to logical equality and use all meaningful properties, make the class or at least the relevant properties immutable and declare the class as final (in Kotlin the default). Do it only for value classes.

4. Favor composition over inheritance is a known best practice and it is a solution for this case as well.

5. If you need to use inheritance, the base class should be abstract or define the equals in the base class as final, following the approach of designed inheritance, so that it is not overridden.