Image by flightsafety.org

Python: Of Composers and Dunders

And how they relate to Functors and Shell Pipelines

Klaus Cinibulk
13 min readMay 29, 2022

--

In this 2-chapter-article we are looking at function composition, how we can compose functions in Python and how we can use the concept of Functors combined with magic methods (a.k.a dunder methods) to implement a neat way — in fact, a tiny DSL — that allows us to write shell-style pipelines.

Pre-Requisites

To follow along, you just need to have a basic understanding of

  • Python functions with type annotations
  • Python classes and magic methods (a.k.a. dunder methods). In fact we are creating a class consisting of 4 methods which are one-liners!
  • the functools package, actually limited to the functools.partial() function
  • a bit of functional programming terminology, like purity and higher-order functions, partial function application, but that’s sufficient.
  • The Python REPL. All code snippets can be copied into your standard Python REPL. Use ipython or bpython for better comfort.

Disclaimer

Although not being a functional programming language, Python supports lots of concepts known from functional programming. There are functools, itertools, structural pattern matching (Python ≥ 3.10), closures, and functions are treated as first class citizens, which is the ultimate pre-requisite when programming functionally.

Conversely, Python is lacking typical FP features like strict type checking, type classes or the composition operator. Nevertheless, it’s often possible to implement certain things — like a composition function — super easily in Python.

I strongly believe that knowing and being able to apply functional programming principles is beneficiary for every developer and every non-functional programming language. That’s the message I want to convey, even if it seems not always being in line with Python idiomaticity at first sight.

What this article is about

Who in the audience has ever used Unix pipelines in bash/zsh/..sh? I dare to say, almost everybody…

$ cat /etc/passwd | grep -v '^#' | cut -d: -f1 | sort | tail -3

What if I told you, that we can create a simple DSL (domain specific language) in Python, which lets us write expressions like this:

result = cat("/etc/passwd") | drop_comments | cut(d=":", f=1) | sort | tail(3)

We will only use the most basic Python constructs like classes and functions, we don’t even need a single external package to be imported.

By the end of the article we will have implemented

  • a single class consisting of 4 methods, each of which is a one liner. This class enables us to write shell-like pipelines.
  • some worker functions to test our Pipe-DSL

Becoming curious? Read on :-)

Hint: I recommend using an IDE (VSCode, PyCharm) with a static type checker like mypy to validate type annotations.

Chapter 1 — Function composition

Setting the scene

Maybe let’s start by implementing cat(), cut(), sort() and tail(). We don’t have to implement sort(), as we can re-use the builtin sorted() function. In order to pipe a value through a pipeline of functions, every function must basically obey one simple rule. They need to be callable with a single value and they have to return another value.

As we want to be type-safe as far as possible, we will make use of Python type annotations. And as we are mostly dealing with lists of strings, let’s create a type alias for that:

from typing import ListLines = List[str]

First, we implement tail(), that’s an easy one. tail() needs to know the number of lines it should return, and apparently it has to know the lines themselves, so we might be tempted to write:

def tail(n: int, lines: Lines) -> Lines:
return lines[-n:]

Looking back at the example pipeline at the beginning of the page we notice, the we have to be able to call tail() only with the number of lines that should be returned, right? But where is the remaining input coming from? Nested (or inner) functions to the rescue!

from typing import Callabledef tail(n: int) -> Callable[[Lines], Lines]:
def inner(lines: Lines) -> Lines:
return lines[-abs(n):] # abs -> eliminate negative input
return inner

Now we have renamed our initial tail() function to inner() taking only a single argument — a list of strings — and returning a list of strings. And we wrapped it inside our new tail(), which takes the number of lines and returns inner. We turned tail() into a higher-order function (a function returning a function and/or taking a function as parameter). Furthermore, we can state that tail is a pure function as it produces output only based on the input.

Now we can write:

>>> lines = ["1", "2", "3", "4", "5"]
>>> tail(3)(lines)
['3', '4', '5']

Side note: The surprising thing here is, that the inner function has access to n (which is defined in the surrounding scope) even after the function tail has returned.We call such functions “Closures”, because they are able to capture variables of their surrounding scope. inner() “closes over” n.

Using this concept, we are now able to “partially apply” functions:

>>> tail3 = tail(3)
>>> type(tail3)
function
>>> tail3(lines)
['3', '4', '5']

Python’s functools has a function “partial” which let’s you partially apply functions, by binding values to arguments, for example:

from functools import partialdef tail_(n: int, lines: Lines) -> Lines:
return lines[-n:]
>>> t3 = partial(tail_, 3)
>>> t3(lines)
['3', '4', '5']

Anyway, now that we know that we can turn functions with multiple arguments into multiple nested functions, we can implement our remaining functions accordingly, let’s continue with cut():

def cut(d: str, f: int) -> Callable[[Lines], Lines]:
def field(s: str) -> str:
try:
splitted = s.split(d)
return splitted[f - 1] if len(splitted) >= f else s
except:
return str(s)
def inner(lines: Lines) -> Lines:
return [field(s) for s in lines]
return inner

Same concept, we separate the options of cut, d and f, from the data cut operates on. For the sake of simplicity, I implemented only rudimentary error handling.

Next, we need cat:

def cat(filename: str) -> Lines:
try:
with open(filename) as fd:
lines = [l for l in fd.readlines()]
return lines
except:
return []

cat() takes only a single argument, hence there’s no need for nesting anything. Same for drop_comments (as an alias for grep -v ‘^#’), really simple and pure:

def drop_comments(lines: Lines) -> Lines:
return [line for line in lines if not line.startswith("#")]

To conclude, we finally create sort:

sort: Callable[[Lines], Lines] = sorted

Ok, so we have created five independent and reusable(!) functions. Furthermore we can state that with the exception of cat(), all other functions are pure as they are producing their results solely based on the input they receive. Given the same input, they will produce the same output!

cat() on the other hand is “impure” as it access a shared resource, the file system. Dependent on the existence of the file(name) passed as an argument, cat() might return either empty list or the contents of the file.

Now we can now glue our functions together to produce some result:

>>> lines = cat("/etc/passwd")   # impure operation
result = tail(3)(sort(cut(d=":", f=1)(drop_comments(lines))))
>>> print(result)
['daemon', 'nobody', 'root']

Whoa, works, but that’s quite unreadable, huh? We can do better by using partial function application:

>>> cut_f1 = cut(d=":", f=1)
>>> tail3 = tail(3)
>>> result = tail3(sort(cut_f1(drop_comments(lines))))
>>> print(result)
['daemon', 'nobody', 'root']

Now, this already looks like a neat pipeline, doesn’t it? It’s still a bit hard to interpret as we have to mentally “parse” it right to left. That’s how nested function calls are evaluated, based on the laws of associativity.

Some facts about composition

function composition visualized

In a nutshell, we can compose functions if the output type of one function f(BLUE) = RED, is the same as the input type of another function g(RED) = GREEN. By doing so, we are creating a new function (g • f)(BLUE) = GREEN. Note how we get rid of the intermediate RED result. Naturally, composition isn’t limited to 2 functions.

In Python (contrary to FP languages) we do not have a builtin composition operator (like ‘.’ in Haskell), but we can easily create our own composition function:

A = TypeVar("A")
B = TypeVar("B")
C = TypeVar("C")
def compose(
g: Callable[[B], C], f: Callable[[A], B]
) -> Callable[[A], C]:
"""Returns a callable composed from g after f"""
return lambda x: g(f(x))

We are using “type variables” as placeholders for real types. IDEs like VSCode are able to infer the correct types, like so:

number_of_lines takes str and returns complex
>>> number_of_lines = compose(complex, compose(len, cat))
>>> number_of_lines("/etc/passwd")
(123+0j)

mypy is also able to detect type mismatches:

42 is int, not str

Likewise, mypy will display a (rather cryptic) error, when you try to compose incompatible functions:

cat() returns Lines, but tail() expects int

Function composition is a super-powerful concept which allows us to create new functions by composing smaller functions. It’s like plugging Lego blocks together. It just has to be ensured that the output type of one function is the same as the input type of the next function in the row.

The result of a function composition is again a function, hence we can keep on composing functions as long as we want (as long as input and output types are compatible). Another consequence is that composing pure functions results in new pure functions:

>>> c1 = compose(tail3, sort)
>>> c2 = compose(cut_f1, drop_comments)
>>> last3users = compose(
compose(tail3, sort),
compose(cut_f1, drop_comments)
)
>>> last3users(cat("/etc/passwd"))
['daemon', 'nobody', 'root']

We composed a new pure(!) function last3users() from 4 of our existing pure functions. Operations with side-effects have been isolated in the cat() function.

Let’s conclude the composition chapter with a few facts and hints:

  • Composition is nothing more than creating new functions by re-using existing functions.
  • Composition is a central concept of functional programming.
  • Apply the Unix philosophy “make each program do one thing well” to your functions! It makes them composable! Remember: it’s like shell-piping the output of one program to the next one.
  • Isolate impure code from pure compositions, otherwise the entire composition becomes impure.
  • Use type annotations in your function signatures! Programming in functional style without types is a no-go! Even though Python is dynamically typed, static type checkers like mypy are incredibly helpful, so use them!

Chapter 2 — Functors and Dunders

Now, back to our actual endeavor — creating a shell-like Pipe-DSL. Using function composition we were already able to set up a processing pipeline, now we want to bring this shell-ish flavor into the game.

First of all we need something that evaluates left to right. The most obvious operation would be method invocation, so we could think of creating a Pipe class with methods performing our operations.

x = Pipe("/etc/passwd").cat().drop_comments().cut(d=":", f=1)...

How could that look like?

class Pipe:
def __init__(self, value):
self._v = value

def cat(self):
return Pipe(cat(self._v))
def drop_comments(self):
return Pipe(drop_comments(self._v))
...

I mean, the solution could work — kind of — but it has some major drawbacks:

  • We don’t want to re-implement our functionality in an OO way
  • For every new command we would have to change the class
  • No chance to inject the ‘|’ operator

So, we need something better… introducing XPipe (just not to interfere with multiprocessing.Pipe):

Instead of implementing each command as a method, we introduce a single higher-order method map() which delegates the real work to a function (like one of those we defined before!), that is passed as a parameter. Finally, it simply returns a new XPipe containing the new value.

We are using type variables again and let XPipe derive from Generic[A]. That’s how we can mimick Generics in Python. “A” is a placeholder for the real type of the value passed to the constructor. We also implemented __repr__() which comes in handy when inspecting XPipe objects.

>>> XPipe("/etc/passwd")
XPipe[str](/etc/passwd)
>>> XPipe(complex(1, 2))
XPipe[complex]((1+2j))

Now we can undertake the next attempt op formulating a left-to-right pipeline:

>>> (XPipe("/etc/passwd")
.map(cat)
.map(drop_comments)
.map(cut(d=":", f=1))
.map(sort)
.map(tail(3))
)
XPipe[list](['daemon', 'nobody', 'root'])

Seems like, we have computed the correct result, but it is captured inside a XPipe object! Still, we don’t have any evidence of the ‘|’ operator. So, we need some magic which lets us replace “).map(“ with “|”. Nothing easier than that!

And the Dunder rolls…

As an experienced Python user you are aware of Python’s magic methods which are mainly used to implement operator syntax. Magic methods are prefixed with double underscores and have double underscores at the end of their names.

For instance, if a class defines a __add__(self, other) method, objects of the class can be “added” like “object1 + object2” which is actually just syntactic sugar for “object1.__add__(object2)”.

The full list of magic methods is available in the official Python documentation. Digging deeper we find an interesting one (https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types):

object.__or__(self, other)

The __or__ dunder method represents the binary (bitwise) OR operator, which is -> “|”. For instance, it is used by <class ‘int’>:

>>> 4 | 6
6

… which is essentially just syntactic sugar for:

>>> int(4).__or__(6)
6
>>> hasattr(int, '__or__')
True

This means that the int class has an implementation for the __or__ dunder method for doing bitwise OR operations.

Apart of its name, it’s a method like any other. It just needs to take a single argument, but other than that, we can equip it with our very special bitwise OR logic. And that’s exactly what we’re gonna do now

XPipe with __or__() just calling map()

Our custom __or__() implementation is supposed to do exactly the same thing as map(). With that we can replace the call to map() with ‘|’:

>>> (
XPipe("/etc/passwd")
| cat
| drop_comments
| cut(d=":", f=1)
| sort
| tail(3)
)
XPipe[list](['daemon', 'nobody', 'root'])

Et voilà! Mission (almost) accomplished! There’s just one little blemish left. We still have to wrap the output of cat() into the XPipe, but we can easily overcome that by directly returning a XPipe object from cat(), let’s create pcat:

def pcat(filename: str) -> XPipe[Lines]:
return XPipe(cat(filename))

Here’s the final version of our pipeline using our XPipe DSL:

>>> p = "/etc/passwd"
>>> pcat(p) | drop_comments | cut(d=":", f=1) | sort | tail(3)
XPipe[list](['daemon', 'nobody', 'root'])

Actually, we have reached the end of our story. Keep in mind that the resulting artifact that enables us to write shell-style pipelines, is the XPipe class (<20 LoC) which can be considered a Functor as it is something that can be “mapped over”, as it exposes a map() method.

Conclusion

Concepts sounding complex in theory often turn out to be surprisingly simple when we start applying them in programming. Functors are such an example of a simple yet powerful construct.

In functional programming languages like Haskell and others, Functors are indispensable elements and are integral parts of the standard libraries. Admittedly, we have just touched the bare minimum amount of Functor properties and there’s lots of additional things to explore.

And as always. It even matters in Python :-)

Thanks for reading, feel free to share, take care!

PS: You might also be interested in https://medium.com/@cini01/python-why-none-is-not-nothing-bb3de55dd471

Appendix: Functors

What follows is some “bonus content” about Functors in general.

A Functor is something that can be“mapped over”. Think of Python’s builtin map function, which takes a function and an Iterable, and produces a new Iterable by applying the function to every element of the input Iterable.

>>> res = map(lambda v: dict(value=v), [1, 2, 3, 4])
>>> list(res) # map returns a generator, hence the list()!
[{'value': 1}, {'value': 2}, {'value': 3}, {'value': 4}]

Quick detour: The Haskell Functor type class definition:

class Functor f where:
fmap :: (a -> b) -> f a -> f b

Although we don’t have type classes in Python, we can state that Iterables in Python would conform to Haskell’s Functor type class. Why? Because we can pass an Iterable[a] to map (instead fmap) along with a Callable[[a], b] and will receive a new Iterable[b]. Admittedly, we are cheating a bit, because Python’s map converts every Iterable input into a Generator.

Now I told you that our class XPipe is a Functor as well, right? Wouldn’t that mean we could pass it to Python’s map()? Apparently we cannot, mainly because XPipe does not implement Python’s Iterable interface, so map() wouldn’t know how to extract the value from our F object.

Still our class XPipe has a map() function which has the same signature and is pursuing the same purpose, and as we don’t have type classes in Python, I think it’s alright to call our XPipe class a Functor ;-)

Here are some Functor facts:

  • Functors are objects that can be mapped over, i.e. there’s a map() or fmap() function, that takes another function as argument which operates on the encapsulated value
  • map() never modifies its encapsulated value directly. Instead it returns a new Functor containing a new value (which can have a different type)
  • map() operations can be chained, i.e. composed
  • Functions passed to map() should be pure and free of side effects.

There are many other Functor properties, but for the sake of this lesson we have touched everything we need to know.

Let’s keep in mind that the primary use cases of Functors and dunder methods are certainly not related to creating Python DSLs :-) Nevertheless it illustrates the possibility of composing processing pipelines just by re-using off-the-shelf functions and faithfully handing them over to map().

In fact it’s just another example of working with higher order functions — for those who want to bring a little FP flavor into their code. And yes, we were somehow “customizing” the __or__() magic method.

Exercise

Don’t like the ‘|’ operator? Can you come up with an implementation that allows for writing C++ style pipelines, e.g.

cat("/etc/passwd") >> drop_comments >> ...

Try to create your own processing pipeline with different operations.

What’s left?

  • Actually, we haven’t been talking about benefits and consequences about using Functors, even in Python. We will cover this in another article, when we talk about representative examples of functors like the Maybe and Either Functors. Additional properties of Functors, like functor laws and how functors are composable
  • How to deal with nested Functors and how to flatten them, e.g. what if the mapper function returns a Functor by itself, resulting in a return type F(F[type]).
  • We have not been dealing seriously with error handling yet. E.g. would it be possible to have impure functions which could fail in a pipeline (spoiler: yes you can!)? For now, as long as pure functions are used, you are on the safe side, though.

--

--