Hidden Liskov Violations

Stephen Fiser
Aug 16, 2018 · 5 min read

I came across an interesting problem while working on a Javascript library for the Paradigm Protocol.

Before we look at my specific example, I need to remind us about Liskov.

The Liskov Substitution Principle is mathematically defined as:

Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.

Let me translate that into something simpler: the same types of things should respond to the same types of messages with the same types of answers.

Liskov Violations can create incredibly fragile code. As Sandi Metz said, your code is fragile if, after every change, distant and apparently unrelated code breaks.

To start, let’s look at a hyper-simple example. Suppose we are building an app for a dog shelter. We might have some classes like this:

class Dog  def speak
"I am a dog"
end
endclass Husky < Dog def speak
"wOoooOOoooOooo..."
end
endclass Labrador < Dog def speak
"Ruff!"
end
end

Now suppose we write a class that will interact with these dogs:

class Human  def pet(dog)
puts dog.speak
end
endniko = Husky.new
fred = Human.new
fred.pet(niko) # => "wOoooOOoooOooo..."

Like I said — ultra simple.

What happens when a new dog comes into the mix that has a different idea about speaking? #freeSpeech

class Chihuahua < Dog  def speak
TacoBellAd.new
end
endlola = Chihuahua.new
fred = Human.new
fred.pet(lola) # => #<TacoBellAd:0x007f8f2980c810>

This creates a really bizarre situation. Now our human needs to know how to process messages from different types of dogs.

class Human  def pet(dog)
if dog.is_a?(Chihuahua)
puts dog.speak.advertising_message
else
puts dog.speak
end
end
end

This silly example highlights the importance of the concept. A human shouldn’t have to have prior knowledge in order to interact with a dog. It should just experience whatever happens whenever it pets one.

This happens because the Chihuahua doesn’t return the same type of data as the other dogs when the same method is called, and thus it forces whatever it is interacting with to be aware that there are differences. Otherwise it will throw an error.

Quick but important note: people often state that the Liskov Principle only has to do with subclasses. Since this is Ruby, I’d typically not actually create an Animal class because it’s essentially useless in this context, but the rule is still applicable. More on this later.

You can see that we’ve created a clear dependency within the Human class on the Chihuahua class because the class name is written directly in the pet method. But this code is also a perfect example of fragility because we also have to be worried that the implementation of the TacoBellAd class might change — not just Chihuahua.

What if advertising_message is changed to return an AdMessage instance which has a render_text method? Well, we have to change Human again to look like:

class Human  def pet(dog)
if dog.is_a?(Chihuahua)
puts dog.speak.advertising_message.render_text
else
puts dog.speak
end
end
end

This specific example has more to do with following the Law of Demeter, but I used it here because we aren’t usually just passing around strings.

If you return objects that have different APIs from method calls on supposedly similar objects, you have created a Liskov Violation. You are forcing the caller to have awareness of internal differences between foreign objects.

In our (ridiculous) case above, in order to pet a dog, a human must know how Taco Bell advertisements work.

Implied Subtypes

We already saw that Liskov doesn’t exclusively apply to classes and subclasses. I think we can actually take it a step further.

Writing good code is an exercise in making other developers understand what you are trying to do, not the computer. The computer can understand binary, and it doesn’t care what you call things.

I would like to suggest that the Liskov Principle can be extended to what other developers understand your code to be, even if that’s not what it actually is. This breaks us completely out of the realm of classes and subclasses and into the realm of good naming and grouping.

Suppose I have something like this:

let signatures = [
{ v: 28, r: '0x6e8...', s: '0x42f...', messageHex: '...' },
{ v: 28, r: '0x73a...', s: '0x9ab...', messageHex: '...' },
{ v: 28, r: '0x213...', s: '0xab3...', messageHex: '...' }
];

This is just a simple array of signatures for Ethereum transactions. A few problematic things can happen here.

First, we could simply extend the data for some signatures, but not all.

let signatures = [
{ v: 28, r: '0x6e8...', s: '0x42f...', messageHex: '...' },
{ v: 28, r: '0x73a...', s: '0x9ab...', messageHex: '...' },
{
v: 28,
r: '0x213...',
s: '0xab3...',
messageHex: '...',
createdBy: '0x...'
}
];

Now this code will carry hidden complexity around with it everywhere it goes. It’s a Liskov Violation because I expect all of these to behave the same, but if I iterate through and call signature.createdBy, some things will return a string and some will return undefined.

We’d probably see a lot of code snippets like if (signature.createdBy) {} in various parts of the codebase to handle that special case. And if someone changes the property name from createdBy to creator? You guessed it. We have to update every file that referenced createdBy.

Seeing something like this would suggest to me that we may need to create an actual Signature class that other objects interact with. Exactly what that looks like would depend on the context.

Another thing that I’ve seen is putting entirely different types of things into a dataset for convenience. Something like:

let signatures = [
{ v: 28, r: '0x6e8...', s: '0x42f...', messageHex: '...' },
{ v: 28, r: '0x73a...', s: '0x9ab...', messageHex: '...' },
{ confirmedAt: [datetime], confirmedBy: [1, 2, 3, 4] }
];

This seems to happen when a dataset is generated in one location and operated on somewhere else. For example, you might include some extra data in an API response because it’s readily available when the API is processing the request.

This leads to the same type of problem with had with our Chihuahua. Everything that consumes that data will need to check what type of thing it is talking to and execute different behavior.

If this data is important to the application, that complexity will quickly take root all throughout the codebase. If this data is shared amongst different systems and services, you are in for a world of hurt.

At the end of the day, I would expect this to be a list of signatures based on the naming. The rule should be that you don’t group things together that are not the same.

This doesn’t mean they always have to be the same data type. They just have to represent the same idea.

For example, consider this: args = [1, 'hello', Dog.new, 'Fred', 42]

None of these things are the same, but in the provided context, I understand each of them to have the same quality of being an argument passed into some function. So from that perspective, they are the same type of thing.

If I said users = [1, 'hello', Dog.new, 'Fred', 42], I’m opening up the door for major confusion.

Blue Bear Digital Inc.

Thoughts on Software Architecture, Business, and Blockchain

Stephen Fiser

Written by

CEO at Blue Bear Digital Inc.

Blue Bear Digital Inc.

Thoughts on Software Architecture, Business, and Blockchain