Object-oriented Programming in Python — Lesson 1. Substitutability and Inheritance

Avner Ben
CodeX

--

This is the first in a series of articles discussing the practical needs to which object-oriented programming responds and the common facilities provided by object-oriented languages to support these needs, stressing the Python implementation and approach. In this first lesson, I present the object-oriented paradigm as a response to the frequent need for functional substitutability and explain why inheritance is often used to implement an infrastructure for it.

Sections in this lesson:

  1. Scanning Heterogeneous Collection
  2. The Message Paradigm
  3. Commonality and Variance
  4. Enter Inheritance!
  5. Abstract Class
  6. Object initialization

1. Scanning Heterogeneous Collection

Object-oriented programming is about functional substitutability. Substitutability, as the name implies, is the ability to replace one thing for another without significant — or noticeable — change. What is it that we substitute? For what purpose? How does one tell if the attempt to substitute has been successful? who is it that should not notice the substitution and how has it just “been fooled”? OK. We are talking about a capability (the ability to do something) that is substitutable. At some step, we require a capability that we can define well (in terms of what it should do for us), yet there may be many methods to achieve that, and we are in the lucky position to allow any of them. Precisely: we are talking about the use case where the range of methods to select from is open and the selection is automatic. (As opposed to being confined to selection from an explicit and final list of alternatives, such as in if-then-else). Let the lucky method that fits the occasion — whatever it may be — take over from here!

There are various programmatic devices that implement functional substitutability, such as a function passed as argument to another function, delegate, state machine, to name a few. The object-oriented paradigm is concerned with substitutability determined by type, where each capability is associated with an object that is associated with a class that determines the method. This does not sound especially simple, I know, and on purpose! Because the programmatic interface is deceivingly simple and intuitive (or so it seems), it is especially important to be precise about what is going under the hood here, or you may lose your way!

The most frequent (and intuitive) case of substitutability of the object-oriented kind is one-to-many. For example, scanning a container (such as list or dictionary) that stores objects that may be of diverse types, yet they all feature the same — substitutable yet correct — capability we need. In object-oriented terminology, we say that all the objects retrieved respond to the same message, regardless of class! How we have come to rely on that (all objects in the container responding to the correct messages, and whether they indeed do) is immaterial. Python allows you to pass any message to any object. At worst (when the object’s class does not support the method), you get a runtime error.

For example (assuming the appropriate classes Circle and Rectangle, not listed here):

Output:

Circle at 55:212. Area: 7853 
Rectangle at 20:20. Area: 15000

This container stores two objects of diverse types — a Circle and a Rectangle— but both know how to calculate area and to get position (but for all we know, each, as appropriate to its method).

As this simple example demonstrates, object-oriented substitutability applies to messages sent to objects, whose behavior is defined by classes. Each iteration in the list retrieves a new object, which may — or may not — be of another class, but this does not concern us, if the object responds to the message “compute area” (i.e. its class, whatever it is, implements a method for “compute area”).

As a side note, my preference of the less frequent — but accurate — term “substitutability”, rather than the more popular term “polymorphism” is not accidental. (Besides the obvious reference to the Principle of Substitutability, to be discussed at the end of this course). Well, describing this programmatic practice as polymorphism (“having many forms”, i.e. the capability to replace form at will) is misleading! If the meaning of being polymorphic is the ability of an object to change its form — i.e. to change its class — then this is not the case. The case of scanning a heterogeneous collection poses a window to the collection, which may reflect, in each iteration, a different object, of a (possibly) different class. Consequently, it is correct to say that the class that supplies the methods in each iteration is replaced, no more and no less! Still, nobody changes form, and no object changes class!

The use of object-oriented substitutability requires infrastructure — object-oriented design. We must prepare in advance so many classes that feature methods for the same message selector (i.e. define methods by that name, which take the correct number of arguments and return the correct type). We have just considered a simple case of requiring two substitutable capabilities (from each object in the list), but often, substitutable object capabilities come in larger groups. Thus, many geometric applications expect all shapes to not only get position and compute area, but also to compute circumference, detect intersection with point, etc. This functional requirement is also known as commonality of interface. (We are talking here about the general requirement for interface, not to be mistaken with the programmatic Interface, implemented by some object-oriented programming languages — Python not being among them — in response to this).

Here is the definition of the classes Circle and Rectangle (used above):

Footnotes:

  1. Circle implements get position that returns point (as two-tuple)
  2. Circle also implements a method to calculate area that returns integer.
  3. Rectangle, like Circle, also implements a method to get position.
  4. Rectangle, like Circle, also implements a method to calculate area.

As stressed here, in Python, objects may be sent any message. Retrieving the appropriate method (from the object’s class) is deferred to runtime, acknowledging the possibility of such method not being found. This is known, especially in interpreted languages, as weak typing. (As opposed to strong typing, typical of compiled languages, where type compatibility is verified during compile time). Python programmers are traditionally not fond of neither of these terms, because the Pythonic philosophy associates the required functionality with what the object can deliver in practice, rather than with its declared type. Consequently, some Python developers prefer, with typical Pythonic humor, to define the Pythonic version as “Duck Typing”. If the errand calls for a duck (precisely: duck functionality), then if the candidate looks like a duck, squeaks like a duck, and walks like a duck (if that is all we require of it), then, for all we know, a duck it is! (Even if it was born a pig).

2. The Message Paradigm

As mentioned at the start, the object-oriented version of substitutability is not the only programmatically available option. But it is — when handled properly — simple, intuitive, and very flexible. For example, compare the heterogeneous Shape collection with some home-made alternatives:

A frequent solution for functional substitutability in old-school procedural languages has been — expecting a tagged record or tagged union — to first inquire the object’s type and then, select the explicit method accordingly.

It would take a separate lecture to get down to all the problems manifest in this design. Here are some highlights:

  1. All candidate methods are revealed in advance, even if not taken!
  2. All candidate shape types are specified in advance, thus closing the possibility to add other types that may also support this functionality. This (the possibility to add Shape types) is crucial when the container is filled in one part of the program and the scan is made in another. There is no reason for the two functions to agree on the exact range of types (provided the common functionality is supported)! And then, when comes the (inevitable) need to add another type of Shape (say, Triangle), we must break this code by adding another explicit branch to the selection tree. We say that the code is broken, until all test cases are run overnight and prove to still work as expected.
  3. This solution does not respect the information hiding of the shapes that it displays. Unlike the object-oriented solution that expects the client to request the information from the object (using the message paradigm), here, the client implements all the various methods in line. It knows how Circles and Rectangles are made and how to compute their area. This wisdom does not belong here! (The only wisdom hidden here is how to scan containers and how to display text to the screen, no more and now less). Encapsulating this logic with the data in a class is a superior option, (in this case).
  4. This programmatic style calls for much code redundancy. Suppose (very realistically) that we must repeat this logic elsewhere, and that we also need a similar code to scan a collection of Shapes and display their circumference, test if they intersect a point, etc. We have just created some maintenance nightmare! Each time a Shape type is added (or the content or logic of an extant shape is changed), we must propagate the correction to all occurrences of the duplicate code. And this, in the happy event that we know where these occurrences are, in the first place. If, for example, we develop basic software for use by others, we now must reach each of them and inform them of the change (and hope they indeed find the time to fix their code to account for it, if they can find it in the first place, and so on).

Of course, code such as the above is not altogether useless and bad. (Had that been the case, there would be no elif keyword in the language). Where the problem domain ensures that there are only these two possibilities (Circle and Rectangle), and their internals and logic are simple and obvious and are not expected to change (so that exposing and repeating it is not a problem), the simple list of alternatives will do (and the object-oriented solution is overkill). However, this usually applies to simple basic-type-based lists of alternatives, such as enumerations and strings, rather than complex data structures with non-trivial logic.

Another procedural practice that implements functional substitutability is to convert each candidate object to a universal representation type (where such exists and is applicable, of course). Thus, as far as the method is concerned, there is no substitutability (the method was and remains strong typed). However, the object of manipulation was converted (i.e. substituted) before arriving at it.

Thus, assuming all shapes to have a string representation (i.e. to be convertible to string), and that this (conversion to string) satisfies our functional requirement, then we could just convert the current Shape to string and then display it. The print function expects a string and gets one. It is not concerned with where the string has come from. (Actually, the built-in print will take care of the conversion by default, but this does not change the example).

A more realistic use case is for an arithmetic operation involving two numbers of diverse types (for example, to add integer to floating-point). This functional requirement is implemented in most programming languages by first casting the less precise number to the more precise type and then, proceeding to perform the arithmetic operator on two objects of the same type naturally (returning an instance of the more precise number type).

But such cases are infrequent, and our Shapes are not a candidate. Shapes are meant to be rendered graphically and used for geometric computations. Converting a shape to string has no applicative significance (besides perhaps, the technical — and user-invisible — uses of dump for debug and serialization).

This leaves us with the “object-oriented” solution. At the price of constructing infrastructure (classes with commonality of interface), it does serve the purpose of functional substitutability well, and is open (for extension, at least in the dimension of additional shape types, if the commonality of interface is maintained), while being closed (for modification. The use cases — e.g., heterogeneous collection scanning — are left alone, even when new classes are added, provided the interface is left intact).

The paradigm is “object”-oriented because the method (how to perform the action) is determined by the object — the receiver of the action (as in the syntactic definition of object). On the contrary, in procedural programming (as in explicit list of alternatives), the method is determined trivially by the verb. What you perform is exactly what you state you do, regardless of the object that suffers the result (or the subject that required it).

The “Message” paradigm: an operation over an object (regardless of whether it changes the object or just extracts information from it) is performed by (1) passing a “message” to the object, consisting of message selector and optionally arguments. (2) The receiver object “responds” to the message by applying a “method”, implemented by its class.

Substitutability: Whatever object is on the receiving side; its class must implement the method!

The illustration describes the programmatic mechanism of late binding (of message selector to method). Given message selector and receiver object, and retrieving the object’s class, we retrieve the appropriate method from the class keyed by message selector and apply it to the object (the “self” parameter).

To put it simply, the object-oriented interpretation of “we know what must be done, but there are many ways to do it…” proceeds with the case of “…and it (how to do it) depends upon the type of the object (the receiver of the action) — so let the object, whatever it may be, do it for us!”. Precisely: the object-oriented paradigm replaces the challenge of activating a substitutable capability with the much simpler challenge of addressing a substitutable object. This so called “Left-handed Polymorphism” (to be discussed later) is simple, self-explaining and very efficient to implement programmatically. (No embarrassing questions asked, such as explicit switch/case constructs). Of course, as long as used within its problem domain. What? Are there use cases outside the object-oriented problem domain? Well, there are a few, and we will discuss some of them in Lesson Three.

Data dictionary:

  1. Required Interface. Message selector array.
  2. Class. (Among other things), method array (corresponding to one or more required interfaces).
  3. Message. A procedural construct involving a receiver object, a message (normally, method name) and optionally, arguments.
  4. Method. How objects of a class respond to a message selector.
  5. Binding (also called dispatch). Given object and message selector, find the method (in the object’s class).
  6. Substitutability (also called polymorphism). The capability to send a message to an object of an unknown class (but complying with a known interface).
  7. Late Binding (also called runtime dispatch, dynamic dispatch). A programmatic mechanism used (transparently) to implement substitutability.

Commonality and Variance

All Shape types share a common interface (i.e. same method names), some common data and some common implementation, but vary in extra data and the method how to calculate area

Footnotes:

  1. Commonality of data: the center point (x and y).
  2. Variance of data: specific attributes — radius, width, height.
  3. Commonality of interface: to get position.
  4. Commonality of implementation: the method how to get the position.
  5. Commonality of interface: to compute area.
  6. Variance of implementation: the method how to compute area

4. Enter Inheritance!

Inheritance is a handy mechanism for factoring out common behavior (methods) and data from a family of classes that are meant to be functionally substitutable under some typical use cases requiring a common interface, (subject to some limitations, to be discussed). But to stress the point, in Python, inheritance is nice to have (but not must have)! Python, unlike some other object-oriented languages, does not require inheritance as prerequisite for substitutability — remember duck typing!

Footnotes:

  1. Class Shape factors out the commonality of the useful Shape types (Circle and Rectangle), as well as (hopefully) the commonality expected out of additional shape types to be introduced later.
  2. Commonality of data (center point). The variance of data (radius, width, and height) was excluded.
  3. Commonality of implementation. The method how to get position is implemented in Shape.
  4. Anticipating variance of implementation. “Shape” is an abstract class, not meant for creating real objects, for (at least) the practical reason that the method how to calculate area is specific and may not be factored out. While we cannot prevent anyone from creating Shapes nevertheless, we make sure that requiring the lame Shape to calculate area will result in error.
  5. Class Circle is now a subclass of Shape (Also, derives from it). We send the superclass “Shape” as argument to class Circle’s initialization. The result is that anything — data, interface, and methods — not defined or implemented here is “inherited” from the abstract Shape.
  6. Although Shape is not meant to exist on its own, it does exist, as part of Circle and must be initialized, as well. (Python seems to have an odd syntax for doing that - to be discussed).
  7. Variance of data. Circles also feature radius (in addition to the center point inherited from Shape).
  8. Variance of implementation. Circle “overrides” the capability to calculate area, complete with method.
  9. Rectangle is also a subclass of Shape.
  10. Variance of data. Rectangles also feature width and height (in addition to the center point inherited from Shape).
  11. Variance of implementation. Rectangle also “overrides” the capability to calculate area, complete with method.
  12. Triangle inherits everything — behavior, data, and implementation — from Shape. It only adds data — the three sides.
  13. The Triangle is requested to calculate area, a method which it does not feature. Or does it?

Output:

7853
Abstract method “calcArea” not implemented!

The class diagram notation of inheritance.

This diagram also demonstrates two common usage conventions. (1), to position the superclass above its subclasses, and (2), to unify the association arcs leading to the superclass.

Inheritance is both a blessing and a curse. For our purpose, an inheritance hierarchy, however intensive and heavily-invested, is just infrastructure for functional substitutability in the so many use cases we have (and — hopefully — in the use cases to be accumulated during maintenance). Avoid the “semantic hierarchy” fallacy! (Which may be useful for other purposes but has little to do with the practical needs of software design)! For some reason, many people — including some who write textbooks — tend to explain programmatic inheritance by general and imaginary scenarios — involving e.g., dogs and cats and the sounds they make — as detached from any commercial use cases of software (of all things). It is your privilege to associate inheritance with some philosophical or other traits, if that helps you memorize the terms involved. But, if you do not wish to join those who are frustrated with the object-oriented paradigm (by their own fallacious interpretation), keep in mind that we are dealing with a programmatic mechanism, made of programmatic artifacts (classes, objects, messages) and which, in the absence of a substantial challenge of functional substitutability — offers dubious usage!

5. Abstract Class

Unlike many other object-oriented languages, Python, for some reason, does not have an abstract keyword. But since leaving the task of ensuring the abstractness of an abstract class (and forcing subclasses to override the abstract methods) to the user is not a particularly clever idea either, at some point a mechanism was added to the built-in library for doing just that — the “ABC” (Abstract Base Class) module. Using ABC ensures (during runtime) that (1) objects of an abstract class are not allowed to initialize and (2) subclasses of an abstract class indeed override the abstract methods they inherit. The latter check is done during object initialization time, which is an improvement over our user-defined solution (that allowed us to create the lame object but failed only when requiring the missing method).

Footnotes:

  1. Importing from the “abc” module.
  2. Class Shape inherits from ABC, which makes it an abstract class — let us wait and see.
  3. The method to calculate area is decorated as abstract method, which takes care that objects of Shape and subclasses that do not override this method will not initialize successfully.
  4. The code attempts to initialize a Shape object. Will it succeed?
  5. The code attempts to initialize a Triangle object. Will it succeed?

Output:

3631
Can’t instantiate abstract class Shape with abstract method calcArea
Can’t instantiate abstract class Triangle with abstract method calcArea

The Circle is doing fine! But the attempts to initialize Shape and Triangle objects raised error. We are also informed which method is missing!

Python’s implementation of object-oriented substitutability is simple and elegant and makes things like inheritance and overriding of functionality intuitive. Especially when compared to some other languages, where the behavior is counter-intuitive at times, due to limitations imposed by a complex implementation (due to other obligations).

To begin with, message selectors to a Python object are looked for (during runtime) in a dictionary provided by its class. So, we do not have to bother with such technicalities as tables, pointers, memory offsets and the unexpected complications that such implementations inevitably creates (most notoriously, in C++). In Python, if you know what you are doing regarding substitutability and inheritance (which is straightforward) you are never going to sit puzzled in front of your own program’s behavior or resort to counter-intuitive tricks to force your code to do things your way. To make a long story short: Given a message selector (method name — a string), Python looks up the attribute dictionary using this name as key and retrieves the method — if available — and invokes the method over the object (the “self” parameter). That’s all there is to it (in principle). To make it even simpler, a Python object has exactly one attribute dictionary, where all methods are already overridden and bound to self and all data are already loaded by the sequence of initializers. This peculiar implementation has two profound implications: While (1), initializing an object may take surprisingly long time (to gear up for substitutability) (2), inheritance, however deep, does not reflect on message passing (everything needed to find the method is already there)!

Incidentally, this architecture explains why initializer super-calls (initializing one’s superclass part) must be forwarded to the base class (an object in its own right), unbound, as in MyBase.__init__(self). The object’s attribute dictionary points to its own initializer (if overridden), bound to self, which is not what we want (we are already there). To obtain the superclass’ initializer, we must resort to the unbound method in the superclass (available globally by name) and invoke it over self. (More on bound and unbound methods later).

6. Object initialization

In the following example, both superclass and subclass feature initializers (constructors). Overriding the initializer in the subclass does just that: it replaces (“overrides”) the inherited initializer in the object’s attribute dictionary, which is then lost for good! Consequently, when such a subclass object is initialized, its superclass part will not be initialized, resulting in, for example, loss of the data that should have been inherited. This behavior is, admittedly, counter-intuitive design-wise, and does not resemble the behavior of compiled object-oriented languages, where member data are specified as part of the class, rather than as part of its initialization process. But once you make the paradigm shift, Python’s behavior becomes consistent and predictable (with intuition living up to it).

Footnotes:

  1. A sloppy way to indicate what should be an abstract method.
  2. Oops, Circle forgot to initialize its Shape part (and ignores the center point arguments)!
  3. The Circle method how to get bounding rectangle takes for granted the center point that it should have inherited from Shape.

Output:

’Circle’ object has no attribute ‘y’

Morale: if you override the initializer, do not forget to initialize your superclass part (if needed). Indeed, in the following example, both superclass and subclass have initializers which are used properly. (As in the original inheritance example)

Footnotes:

1. Now the superclass part is initialized. The initializer super-call natural position is before setting the subclass state (because the latter naturally depends upon the former). But this is a convention. Unlike some other object-oriented languages, Python does not restrict where to initialize the superclass part (or, as we have already seen, whether to initialize it at all).

Output:

((5, 5), (25, 25))

But when the subclass features no explicit initializer, the superclass initializer is used to fill the gap. (And, since the subclass features no additional data, there is nothing more to do). This follows from the attribute dictionary architecture. The object inherited its superclass’ initializer and, since it did not care to override it, this is what it gets.

For example, Point is a shape with no significant dimensions (but still inherits the center point from Shape), so there is nothing to initialize there.

Footnotes:

1. The Point has no initializer that we can see, so it is initialized without arguments, which raises an error!

Output:

Shape.__init__() missing 2 required positional arguments: ‘x’ and ‘y’

Now, the Point is initialized with the correct arguments (to its Shape-part initializer).

Output:

((10, 15), (10, 15))

What next?

In this first lesson, we have considered the object-oriented paradigm as response to the need for functional substitutability, in a certain (but frequent) applicative context. Then, we saw why inheritance is often employed to construct infrastructure for it (and why it is indeed useful — but not essential — in Python). This furnishes the basis. However, it is not recommended to proceed from here on one’s own (at least, not to build non-trivial commercial applications)! Object-orientation — surprise, surprise! — is no panacea! Unfortunately, object-oriented programming has as many disappointed practitioners as happy ones (but continues — and will for sure, continue — to be used intensively, one way or another, nevertheless)! Experience has been showing that the benefit of object-oriented programming is tied to applying it according to common design idioms, patterns, and architectures. Thus, the rest of this course is dedicated to demonstrating some of the more significant among this common wisdom, starting, in the next lesson, with some heavy-weight design patterns that demonstrate the power of object-oriented programming in all its glory. (And in the lessons that follow, we will try to tackle the less glamorous side).

Lessons in this course:

  1. Substitutability and Inheritance (you are here!)
  2. The glory of OO Substitutability: the “Composite” Pattern
  3. The limits of OO substitutability: the “Visitor” Pattern
  4. Some boring design patterns
  5. The limits of inheritance

--

--

Avner Ben
CodeX
Writer for

Born 1951. Active since 1983 as programmer, instructor, mentor in object-oriented design/programming in C++, Python etc. Author of DL/0 design language