Liskov, Equality, OO, and Math

Months ago, I saw a code sample of an equals(…) method implementation (in Java, if memory serves). It looked a bit like this:

public boolean equals(Object o) {
// self check
if (this == o)
return true;
// null check
if (o == null)
return false;
// type check and cast
if (getClass() != o.getClass())
return false;
K other = (K) o;
Object.equals(field1, other.field1)
&& Object.equals(field2, other.field2)
&& Object.equals(field3, other.field3)
return true;
return false;

There was some concern about checking that the two instances’ classes were equal; that is, verifying that both objects were instances of the exact same concrete class. The question that was raised from this was:

Is checking the concrete types for equality a violation of the Liskov Substitution Principle?

There are a few concepts that we need to understand if we can unpack this.

The Mathematics of Equality

In mathematics, two numbers are considered equal if they are… umm…

the same.

Profound, huh? For numbers, that makes a lot of sense. They’re equal if and only if they are the same dang number. When you consider programming languages like F#, Scala, or Clojure, where you can leverage types with structural equality, the concept of equality still holds true if the structure of both objects and the values in each are identical. However, these languages’ types that have structural equality semantics (for good reason) don’t support inheritance. The semantics of equality in these situations only care about two things:

  1. Are the structures equal?
  2. Are the values equal?

If the answer to both questions is “yes,” then the things are equal. Types really don’t matter much here. What matters is the pure mathematical are-these-things-representing-the-same-value equality semantics.

Equivalence Relation

Those equality semantics, by the way, are a special case of a mathematical thing called an equivalence relation. According to Wikipedia, an equivalence relation is:

[…] a binary relation that is at the same time a reflexive relation, a symmetric relation and a transitive relation. (ref)

So what does that mean? Well, a binary relation is a relation between two things, which is to say it says whether or not two things in a particular order are related to each other. The order is important for binary relations, because things like “is less than” and “is divisible by” are binary relations. 5 is less than 8, but 8 is not less than 5. Order matters. Furthermore, the idea of a binary relation is a relation between two members of a set (we’ll talk more about sets later).

A reflexive relation is a binary relation where the relation holds true when the same value is on either side of the relation. A relation like “is divisible by” is reflexive, because a number is always divisible by itself. A relation like “is less than” is not reflexive because a number is not less than itself.

A symmetric relation is one that continues to hold true when the values are reversed for any pair of values. The “is divisible by” relation is not symmetric, because you can find values where the reverse isn’t true, like 6 and 3. The number 6 is divisible by 3, but 3 is not divisible by 6. However, relations like “have the same value modulo 3” are symmetric.

And finally, a transitive relation is a relation such that if the relation holds true for (a, b) and (b, c), then it must also hold true for (a, c). Relations like this include “is less than”, “is greater than”, and “is divisible by”. However, a relation like “is perpendicular to” on lines isn’t transitive. For instance, consider the following diagram. Line AB is perpendicular to line BC, and line BC is perpendicular to CD, but line AB is not perpendicular to line CD. Therefore, “is perpendicular to” is not transitive.

AB is not perpendicular to CD

The notion of mathematical equality is a special case of an equivalence relation; that is, equality is even stricter than being reflexive, symmetric, and transitive.

Equivalence Relations and Liskov

So what does this mean for an equals(...) operator in our language of choice? When you consider object-oriented languages like C#, Java, or Python, then this notion of “equality” can be fiddled with. You can define your own rules for equality. You have the flexibility to break the rules and make the equals(...) method something other than strict mathematical equality. Now there are (arguably) good reasons to do so; for instance, maybe you are using an ORM where multiple copies of an object of the same type with the same primary key value could be around, and you want to see if these two different references to two different objects in memory are representing the same entity. In this case, “equals” means “has the same type and the same primary key”. Django does this, for instance. However, Django’s implementation still satisfies those three properties of an equivalence relation.

You’re free to define an equals(...) method that breaks one or more of those properties of an equivalence relation, and when you do, Bad Things™ can, of course, happen. Breaking those rules makes your objects no longer capable of behaving properly as hashtable keys, for instance.

But What About Subclasses?

As I said before, equivalence relations pertain to members of a set. A set, mathematically speaking, is a well-defined collection of distinct objects. When you consider an equals(...) implementation for any given class in an object-oriented language, it is impossible to ensure proper mathematical rigor within that equals(...) method and account for the potential for yet-to-be-defined subclasses of that object. Simply put, when you create a subclass, you’re altering the contents of the set that the base class’s equals(...) implementation is based on. The set is no longer well-defined.

This forces those implementations to check for type equality against the concrete types. Does this mean a violation of the Liskov substitution principle? I don’t think so. You can’t define a binary relation without a set, and a set is well-defined. By introducing the possibility of inheritance, you’re taking away the well-definedness of the set. By defining equals(...) in terms of a known set, it preserves its mathematical soundness. If the expectation is that “equals” is the mathematical equivalence relation, then there is no choice than to check the concrete types.

The trouble is that leaving the same equals(...) implementation with a subclass could potentially break the reflexive, symmetric, or transitive properties of that equals(...) method. This is why languages with structural equality semantics don’t allow subclassing those structural types. By allowing subclasses of those structural types, it harms the semantics of the equality relation.

Equality and OO

Simply put, the idea of equality in an object-oriented context can often be a real mess. Consider avoiding customizing the equals(...) behavior, and use a different way to compare objects for “equality”. The Liskov substitution principle misleads this conversation because for many OO languages, you’re forced to expose an equals(...) method on your objects by inheritance. Be intentional about whether or not your classes need to be compared for equality by any other than the default technique (usually, reference equality). My typical rules are:

  1. Avoid using the equals(...) method as a part of your classes’ semantics if possible; i.e. be willing to say “these classes aren’t meant to be compared for equality via the equals(...) method.”
  2. When forced to override equals(...), limit those cases, if possible, to value objects and not identity objects. Prevent subclasses (e.g. final or sealed) so that the equality semantics can be preserved: don’t let the set become ill-defined.
  3. Consider using an external comparison operation to test two instances of a class for their relationship. Often, “equals” is better expressed as “has the same ID” or “refers to the same absolute path.” Make those relationships explicit rather than hiding those subtleties behind the equals(...) method.
  4. Don’t try to be clever with equals(...).