# Category theory and Haskell

I’m recommending to look on MIT lectures “Programming with Categories”, with Brendan Fong, Bartosz Milewski and David Spivak.

https://www.youtube.com/playlist?list=PLhgq-BqyZ7i7MTGhUROZy3BOICnVixETS

## Motivation

Quote of Bartosz Milewsk (emphasizes are mine):

Category theoryis a treasure trove of extremelyuseful programming ideas.Haskellprogrammers have been tapping this resource for a long time, and the ideas are slowly percolating into other languages, but this process is too slow. We need to speed it up…

Composition is at the very root of category theory — it’s part of the definition of the category itself. And I will argue strongly thatcomposition is the essence of programming. We’ve been composing things forever, long before some great engineer came up with the idea of a subroutine. Some time ago the principles of structured programming revolutionized programming because they made blocks of code composable. Then cameobject oriented programming, which is all about composing objects.Functional programming is not only about composing functions and algebraic data structures — it makesconcurrencycomposable — something that’s virtually impossible with other programming paradigms.

https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/

You can also view extended version of why it is good idea for programmer to learn Category Theory here https://www.youtube.com/watch?v=oScqdMBk8Q8

# Course description

We will assume no background knowledge on behalf of the student, starting from scratch on both the programming and mathematics.(Flyer)

http://brendanfong.com/programmingcats.html — official page of the course.

Here you can find the syllabus.

And here http://brendanfong.com/programmingcats_files/cats4progs-DRAFT.pdf you can see the course notes.

- This is course in
**applied**math and programming. For example, the Set theory is used very informally (see link to other course below). - This course concentrated on the
**intuition.**For example, you can see in the picture above description of the intuition about*natural transformation.* - This course
**doesn’t**provide you with**complicated math examples**(there is*no topological space*s, for example), it concentrated on simple math example or the ones that are important for the categorical theory itself (like Set, Poset). - The main example is
**Haskell**programming language. You’re learning it alongside with categorical theory applying it’s ideas directly. This is sort of main example of*categorical theory application*in the course.

## I think the following note will be useful for you:

## About math:

- There is many “non-precise”/”applied” usage of math (especially when dealing with Set Theory, in particular the Russel Paradox is ignored;
`Cat`

is category of small categories). You should be ready to ignore them (you can see https://www.youtube.com/watch?v=p54Hd7AmVFU and https://www.youtube.com/watch?v=O2lZkr-aAqk and https://www.youtube.com/watch?v=NcT7CGPICzo with more rigorous math, though it lacks proves, but the math. statements itself are correct). - Bare in mind, the for categorical theory 2 “entities” are “the same” if they are
*isomorphic*. In particular, if 2 “entities” are*equal*they considered to be “the same”. But, i**f 2 “entities” are**. There is also “weak isomorphism” as define by adjunctions, see below.*isomorphic*they’re considered “equal for all intended purposes”

- The
**dotted line in the diagram represent**(it actually never explained why dotted line is used). For example, when we construct*isomorphic*object*categorical product*(you can think of`pair`

in programming language or Cartesian product in Set) we’re constructingpair of morphisms.*isomorphic* **Preorder****and****Order****are**. The difference between them that*almost the same*from categorical point of view`Order`

has addition antisemitic property: if a≤b and b≤a than a=b. But in`Preorder`

if a≤b and b≤a than*a is isomorphic to b*(because composition leads to`Id_a`

or`Id_b`

). The “unintended” difference will be that in graph representation of`Order`

there will be no “backwards” edged and the number of vertexes will be lower (2 vertexes with back-on-forth edges in`Preorder`

are replaced with 1 vertex in`Order).`

3. **Thing category**** is category whose homsets each contain at most one morphism **(this definition is not provided in the course)

**.**It provides example of category where morphism is isomorpoish, but there is no “inverse” morphism for it (being ismoprhic is “strickly less” than to have “inverse” morphism; this is not true in

`Set`

with functions category). See details in the appendix below.# About programming:

- Knowledge of functional programming is never assumed, but you should better have some idea about currying beforehand (it is used before it explained in the course).
- Bottom type (⊥) existence is ignored (while is briefly mentioned). This make
**Haskel type not strictly equivalent to set in**`Set`

with functions category. For example, set analogue to`()`

type (unit-type) is singleton set. It is said that in Haskell unit-type has`single`

value (that is denoted as`()`

). This should be`initial object`

in`Hask`

with types and in`Set`

with functions. But, technically**⊥ is also member of****()****type**. So, technically, there is 2 value their. Actually, buttom is member of any type in Haskel, that is ignored. This leads, that function in`Hask`

are actually`partial`

(because of bottom).**We’re reason about function in****Hask****as they where**

(defined for every value in the domain). You can read about justification why “Fast and Loose Reasoning is Morally Correct” here http://www.cse.chalmers.se/~nad/publications/danielsson-et-al-popl2006.html**total** - There is implicit usage of Object Oriented concepts such as parametric polymorphism. You can think of Generic in Java or C++ template. There is very brief explanation in the lecture (it is, however, assumed that you know what is “regular” polymorphism is).
- It is assumed that you know what is pattern matching (see definition of
`length`

function in Haskell or see Scala example below). For example, latest Java 15 still has no pattern matching (there is only*Pattern match for instanceof*as preview language feature in Java 14, see my post here for the details). - I don’t know how to can I emphasis this enough:

## In Haskell, the `self`

/`this`

is passed as **last **argument, not as first.

I don’t understand why this point is ignored also in Haskell tutorials, if I had know this before, it would make my you life easier.

Consider, simplified version from the standard library:

Based on

http://hackage.haskell.org/package/base-4.5.1.0/docs/src/GHC-Base.html

http://hackage.haskell.org/package/base-4.5.1.0/docs/src/Data-Maybe.html

Let’s look on `fmap `

signature in `Functor Mayber, `

it will be roughly

`fmap::(a->b)->Maybe a-> Maybe b`

or (using curing) and denoted f in place of a->b

`fmap::(f, Maybe a)-> Maybe b`

**Note: **Here "`self”`

is `Maybe a.`

Let’s look on simplified implementation in Python:

Based on

https://github.com/dbrattli/OSlash/blob/f7283d92a379ca86eba2cf7109ef68a7bc27dc3c/oslash/maybe.py

Let’s look on `fmap `

signature in `Maybe(Functor),`

it is:

`def fmap(self, f)`

where `self`

is `Maybe,`

f is function.

**Note:**

- I have erased Python type hints for clarity. See Scala example below to recover the details.
`Just`

and`Maybe`

should be really “sealed classes” . See, for example, alternative Scala implementation:

**Note:**

- Plus sign in Maybe definition means, that Maybe is covariant, i.e. if X<:A (X is subtype of A, X “extends” A), than Maybe[X]<:Maybe[A]. You can read my separate post about variance. This means Maybe is covariant Functor in category theory.
- Here
`f`

is passed implicitly. `Nothing`

in Scala is analogue to`Void`

in Haskell (that is**not****void****!**).`None`

is defined as singleton object. For every`Maybe[B]`

it can be use as value.- Case (pattern matching) is done on
`this`

that has type`Maybe[A].`

if it is singleton`None`

type than`None`

value is returned. If it is`Just[A]`

type this type is deconstructed using primary constructor,`x`

binds with`Just.this.a`

value and used for computing`f(x)`

later.

The JVM see the signature of the fmap method roughly as following :

`public static scala.Function1<Maybe, Maybe> fmap(`**Maybe this**, scala.Function1<java.lang.Object, java.lang.Object> f)

**Note:**

- First
`java.lang.Object`

is erased type from`A`

, second`java.lang.Object`

is erased type from`B`

. - First
`Maybe`

occurrence is erased type from`Maybe[A]`

, second`Maybe`

occurrence is erased type from`Maybe[B]`

.

The point that I want to demonstrate here is that **this**** reference is passed as first parameter, not the last as in Haskell**.

There are couple of topics that I have to look on another places and than go back and watch again in order to understand on what’s going on.

6. Adjunctions

https://www.youtube.com/playlist?list=PL54B49729E5102248&pbj=1

There are 2 equivalent definitions for adjunctions and actually it is better to start to one with unit η and counit ε and not with one that are used in the MIT lecture.

If 2 categories has adjunctions they are “very similar” or are “weak isomoprhic”.

7. Recursion Schemes, F-algebra, catamorphism, Lambek’s Lemma, fix-point functor

- For any constant functor exists a fix point

You can think about regular f(x)=5 function, it is clear that f(5)=5, 5 is fix point. You can easily find it, start from any x_0 f(x_0)=5. Than use x_1=5 as new starting point: f(x_1)=f(5)=5. Note, that actually, what we did is f(f(x_0)=5 and further applying of f doesn’t change the result. - Consider function f(x)=x². It is clear that f(0)=0, that is 0 is fixed point (actually 1 is another fixed point). We can start from any point in (-1,1). Let’s take x=1/2.

f(1/2)=1/4.

f(1/4)=1/16

…

We will get the following sequence

{1/2, 1/4,1/16…1/2^n…| the n-number of applying of f}

This process will converge at 0 (the limit of the sequence above is 0).

(See Newton’s method, it is actually mentioned in the lecture). - At this point you may want to look
*Peeling the Banana: Recursion Schemes from First Principles — Zainab Ali*https://www.youtube.com/watch?v=XZ9nPZbaYfE

It is very unformall, but has working code (in Scala) and great following diagram that helps me for understanding on what is going on:

A — is carrier object (`Int`

for example). We want `evaluate`

our data-structure to an `Int.`

R — is recursive data-structure, “full-blown tree”.

FR — is data-structure as a functor, “the growing tree” that “approximate” R.

FA — is data-structure-as-a-functor that holds “lifted” answer of type “lifted” carrier object.

alg: function from data-structure-as-a-functor to the carrier type, aka `evaluator`

or `structure map function`

. The type is `alg:: FA->A.`

For example, `sum:: ExprF Int->Int.`

in: isomorphism from data-structure-as-a-functor ( “the growing tree”) to recursive data-structure (“full-blown tree”). It is implemented first manually, and than using `Fix`

constructor from Lambek’s Lemma (see details below, they are not provided in the video).

out: isomorphism from recursive data-structure (“full-blown tree”) to data-structure-as-a-functor ( “the growing tree”). It is implemented first manually, and than using `unFix`

constructor from Lambek’s Lemma (see details below, they are not provided in the video).

cata: (shortcut from catamorpish) is generalization of `List`

’s `fold`

function, higher-order function that knows how to “walk on” recursive data-structure. It takes as parameters “the actions” that we want to apply. These “actions” has also `accumulator `

as placeholder for previous result, that cata will supply.

`fmap cata `

is “lifted” cata function.

`cata`

can be construct as

`alg ∘ fmap cata ∘ unFix`

=`alg(fmap(cata(alg)unFix(r))`

where `r`

is recursive data-structure, tree for example.

Let’s walk through the formula of `cata.`

r — is recursive data-structure, “full-blown tree”, first by invoking `unFix`

we’re converting it to data-structure-as-a functor. Than we’re recursively (`cata`

) applying `alg `

to the “growing tree”, we’re “pushing down” (`fmap`

) `alg`

into our data-structur. When it hits “leaves” (`ConstF`

term, for example), it doesn’t take any `alg`

parameter, so this is base case of recursion. Applying `alg`

on leaves “change it’s state” according to `alg,`

than we are “moving up” (we’re exiting recursion calls) and “updating” data-structure state. When we finish the `fmap cata`

, we’re in `FA`

state, we have our data-structure-as-a-functor that holds the answer. Applying `alg`

one more take will extract the answer from data-structure-as-a-functor to answer `A`

.

**Note: **Now, I’m going back to the MIT lecture.

- The process above is similar of the “growing tree” using
`ExprF`

functor, where in iteration`n`

we can have trees**up to the**`depth`

`n`

. We’re using directed colimit in category theory and not regular limit, see Adámek’s theorem). Note, that in the lecture only the definition of*direct limits of algebraic objects*and not for an*arbitrary category*is provided*.*Using colimits help us to identify the “same” trees from different iteration. (for example, we have leaves in every iteration, we have trees with depth=1 on every but first iteration, etc). - Adámek’s theorem says that under some condition exists initial F-algebra (and it can be constructed as colimits of ω-chains). Lambek’s Lemma says,
*if functor F (“the tree”) has initial algebra α:F(X)→X then X is isomorphic to F(X) via α,*in other words in initial algebra F(X) ≅ X or in other words*initial algebra of functor F (“the tree”) is fixed point of the F*(there is prove in the lecture that doesn’t rely on Adámek’s theorem).

This means, that practically, it is enough to rely on Lambek’s Lemma to find the`Fix f`

—*fixed point of the functor F (*and to hope that there are no more fixed points…) to construct initial algebra.

Haskell code is provided in the lecture. Isomorphism of Lambek’s Lemma is implemented by`newtype`

(any`newtype`

in Haskell defines isomorphism; the same effect can be achieved using`pattern match`

with`case class,`

see the Scala code from the video above).

# Alternative course

You can also see more “pure math” and ** extended **version of this lecture here (Seattle, Summer 2016):

**Note: This course requires some knowledge of naïve Set theory** (there is heavy usage of many properties without any explanation about why they holds), at the bare minimum

*Cartesian product,*

*discriminated union*

*of sets*,

*relations (reflexivity, antysemmetricity, transitivity)*all properties about

*function*between sets, properties about function composition, including why this operation is associative and how

`id`

behaves with it, properties about *injective*and

*subjective*function and their connection with function composition (“cancellation” rules).

**Cantor–Bernstein theorem is**

**not**required, though. :-) (You don’t require to know this or this).

https://www.youtube.com/watch?v=I8LbkfSSR58&list=PLbgaMIhjbmEnaH_LTkxLI7FMa2HsnawM_

https://www.youtube.com/watch?v=3XTQSx1A3x8&list=PLbgaMIhjbmElia1eCEZNvsVscFef9m0dm

https://www.youtube.com/watch?v=F5uEpKwHqdk&list=PLbgaMIhjbmEn64WVX4B08B4h2rOtueWIL

You can also find *Bartosz Milewski’s ‘Category Theory for Programmers’ unofficial PDF and LaTeX source *here https://github.com/hmemcpy/milewski-ctfp-pdf/

**Note:**

Bottom type (⊥) existence is recognized and discussed in some details (without providing any proof). You can read about justification why “Fast and Loose Reasoning is Morally Correct” here http://www.cse.chalmers.se/~nad/publications/danielsson-et-al-popl2006.html

*Appendix: Thing category*

*Appendix: Thing category*

*Correspondence between **Thing category** and **Preorder** is provided by observation that for every vertexes x,y hom(x, y) = {(x, y)} if x ≤ y (and otherwise the empty set)} and (y, z)∘ (x, y) = (x, z). *Also note that associativity of ≤ as `Preorder`

implies associativity of composition in thing category (and by contsraction vice versa, because if morphism exists, we have exactly one morphism that corresponds to ≤). So, Another way to look on this is to say that **Preorder**** **is enough to create category and **there is no additional structure that is not imposed by being category in it**. In this sense `Preorder`

is “the essence” of ordering (compare to *partial* `order`

and total order — both has additional requirements).

**Note: **In Thing category every morphism is epic, every morphism is mono (we have at most 1 morphism between 2 elements a, b). If we have *partial* `order`

than every morphism `f:a->b `

is *isomoprphism, *but there is no “inverse” morphism `g:b->a`

(so, for every `f`

there is no g such that `f ∘ g = Id_b`

and `g ∘ f = Id_a`

) (because there no “inverse” morphisms at all).

**Note: **In Set category function is *injective *and *surjective *iff exists “inverse” function, but this doesn’t hold in any category.