Monoids, everywhere

Sarma Tangirala
notallmodels
Published in
3 min readSep 19, 2016

Monoids are one of those things that started blowing my mind once I understood what they meant and how they could be wielded. My first encounter with them was as part of a college course on discrete mathematics which dealt with a bunch of algebraic structures. The course was a pithy exposition of their properties and nothing else, I distinctly remember trying to understand why on Earth one would need a ring or a semigroup, especially with software engineering. Having reencountered [1] them now, I find that they are a very interesting idea and that one can find them everywhere if they looked hard enough (that’s probably not true :). This post is mostly a reminder to myself to try and find common patterns and structures that can be extracted from a problem domain, and to work more with functional programming abstractions. Ideally this would mean better managed code, well tested and DRY-ed.

I point to wikipedia [2] for a formal definition of a monoid,

Suppose that S is a set and • is some binary operation S × SS, then S with • is a monoid if it satisfies the following two axioms:

Associativity,

For all a, b and c in S, the equation (ab) • c = a • (bc) holds.

Identity element,

There exists an element e in S such that for every element a in S, the equations ea = ae = a hold.

Here’s a quick implementation of a monoid in python,

A monoid

I’ll explain in a moment why this implementation has a ‘store’ dictionary. Disregarding that, all one needs to understand is the ‘zero’ and the ‘foldOp’.

‘zero’ is the identity element while ‘foldOp’ is the binary operator from the formal definition. This implementation assumes foldOp is associative. foldOp is short for the fold operation borrowed from functional programming.

The sumAggregator which is defined as a Monoid is a trivial example of how one would use a monoid. The motivation for studying this trivial application is very straight foward and has been very illuminating for me.

Consider that you had a very large dataset of samples from an integer set and let’s assume that you needed to plot a histogram of frequencies of said integers. A natural solution to this problem is to use spark or hadoop if we were to assume that the dataset were multiple terabytes in size.

The problem now is that if the notion of a histogram were not inherently a monoid, it’s very easy to encounter scaling issues in the form of a last-reducer problem. The following bit of code serves as an example,

A possibly thorny implementation of map-reduce

Not unlike what’s discussed in [1], this implementation of a map and reduce function is not amenable to map-side combiners that employ subsequent reduce operations at the site of the map operation after the map operation is completed. Like what’s discussed in [1], the following code tries to address this problem,

Combiners ftw!

Notice how the ‘store’ used in these two examples is similar to the ‘store’ used in the initial monoid implementation, and is employed in about the same fashion.

The takeaway here is that the first version above seems to be doing a lot of wasteful work when in fact the problem domain lends itself easily to significant gains in time complexity; the first version scales linearly in the size of the input while the second provides gains scaling in the size of the integer valued keys. This is due to the fact that computing a histogram is achieved by defining the combining operation as a monoid over an integer set and also noting that the combining operation is both associative and commutative, in this case addition per some bin.

In the context of big data therefore, identifying monoids is very beneficial in scaling over large datasets.

The deeper advantage here though is that identifying a monoid helps organize my code. The easiest example I can think of with respect to problems I work on at my job is scaling up machine learning models. The simplest of these are word-embedding vectors, that are used to capture some notion of behaviour, which can be treated as a monoid given that real number vectors are associative and commutative with respect to an element-wise addition operator. In doing so, I have now identified several aspects of these word embeddings that take advantage of how one might repeat code that computes say the difference, products or time-decays of such vectors.

So in conclusion, monoids are awesome and I can’t stop fanboying about them :D

[1] https://arxiv.org/abs/1304.7544

[2] https://en.wikipedia.org/wiki/Monoid#Definition

--

--