Day 5: Set — Harder Than You Might Expect

Ben Clifford
Twelve Monads of Christmas
5 min readDec 12, 2016

Sets are a bit like lists, and so you can probably imagine that there should be a Monad instance for them. The first part of this post will be why that does not work in the “obvious” way, and the second part will describe a different way that does work.

Why we can’t have a Set monad

How might we expect a Set monad to work? Something like this:

> s = do
| x <- fromList [1,2]
| y <- fromList [0,1]
| return (x+y)
|
> s
fromList [1,2,3]

Something similar in [] would give [1,2,2,3] but sets don’t have duplicate elements so one of the 2 values has disappeared here.

How would that extra 2 be eliminated? The monad implementation would need some way to compare values: for example the Eq or Ord typeclasses, giving == or compare. And luckily in the above case, Int has those type classes.

But what about this:

s :: Set (Int -> Int)
s = do
x <- fromList [1,2,3]
return (const 7 :: Int -> Int)

If we did this with lists, we’d get out a 3 element list of functions: [const 7, const 7, const 7] :: [Int -> Int]

Those three elements are “obviously” the same. But, why is “obviously” in quotes? Because it’s “obvious” to you in this case, but not to Haskell in general. There’s no way to compare functions in Haskell — it’s not even clear what it means for them to be equal. (Try typing nub ([const 7, const 7, const 7]) :: [Int -> Int] into ghci to see what happens)

Maybe we could say that you can’t return a function (or rather, can only return values with the appropriate Eq/Ord type classes). But the rules of Haskell monads say I’m allowed to return a value of any type I want — monad instances don’t get to pick a new type signature for return:

return :: Monad m => a -> m a

So this doesn’t look like it’s going to work.

Oh dear.

How we can have a Set monad

All is not lost, it turns out. The set-monad package provides a Set monad despite the above.

The technique is to make a new data structure (confusingly also called Set in the library but I’m going to call SetI for Set Instruction) that describes how to build a Set and an interpreter that can run that description to actually build a set. The data structure is an instance of Monad but we can put whatever constraints we want on the interpreter. Although we can write a description of how to build a set of functions, we can’t actually evaluate it into an actual Haskell Set.

Here’s a (mangled) subset of the data structure used by set-monad:

import qualified Data.Set as S
data SetI a where
Return :: a -> SetI a
Bind :: SetI a -> (a -> SetI b) -> SetI b
Prim :: (Ord a) => S.Set a -> SetI a

We’ve got two constructors that describe doing regular monad things: Return describes how to build a Set from a single unconstrained value and Bind describes how to take an existing set of instructions and bind on another step, also with no constraints. >>= and return are translated directly into these constructors.

The third constructor, Prim, is specific to the set monad: it describes how to build a set from an existing Set, but only if the contained value type behaves right. We’re allowed to put whatever constraints we want on the Prim instruction, because it isn’t part of the Monad interface — so we can insist on Ord here.

If we write a monadic expression, we end up with a SetI structure, which pretty much is just the original expression but encoded as into a data structure. Nothing set-ful has happened yet.

Next we need an interpreter that knows how to run a SetI program and return a concrete Set. As this is also not part of the monad interface, we can have whatever type signature we want, and so we can make it clear that we can only evaluate programs that will return sets of Orderable elements.

run :: (Ord a) => SetI a -> Set a

run is implemented on a case-by-case basis:

If the instruction is “here is a set”, then that’s the set we evaluate to. The Ord constraint is already present on Prim so there is no problem here.

run (Prim s)                        = s

If the instruction is “return this single value as a set”, then we turn that single value into a set of one element:

run (Return a)                      = S.singleton a

Interpreting bind is a little bit more complicated. There’s a different case for each of the left hand parameters. These different cases make up for the fact that we can’t construct sets of arbitrary values.

If we’re binding a primitive set to the next action, then we can do the “obvious” bind, quite like list: run the rest of the program f for every element of the left-hand set, and merge the results with union.

union requires an Ord constraint, but we have that because run can only be invoked on programs that have that constraint in their return type.

run (Bind (Prim s) f) = S.foldl' S.union S.empty (S.map (run . f) s)

If we’re binding a single returned value, there’s no mapping/union to happen: we simply run the rest of the program on that singleton value. We don’t need to create an intermediate single valued set to map over.

run (Bind (Return a) f)  =  run (f a)

and finally, if we’re binding a bind, we re-arrange the brackets a bit and recursively evaluate the rearranged form.

run (Bind (Bind ma f) g)  =  run (Bind ma (\a -> Bind (f a) g))

Note that in these above cases we’ve not actually used the fact that Prim has an Ord instance! It looks like we could have merged the Prim and Return constructors, and just relied on the Ord constraint on run.

Where that Ord instances does get used is in some cases I haven’t mentioned. set-monad is also an instance of MonadPlus which gives something like ++ for lists. This is implemented as additional constructors Plus and Zero and a bunch more cases in run. The set-monad source code is pretty straightforward to read to see all those cases.

This style of taking Haskell monad syntax into an intermediate data structure and then interpreting that data structure turns out to be more generally useful, and I’ll talk more about that in a few days.

See Also

set-monad on hackage

Data.Set in the containers package

--

--