Day 5: Set — Harder Than You Might Expect
Sets are a bit like lists, and so you can probably imagine that there should be a Monad
instance for them. The first part of this post will be why that does not work in the “obvious” way, and the second part will describe a different way that does work.
Why we can’t have a Set monad
How might we expect a Set monad to work? Something like this:
> s = do
| x <- fromList [1,2]
| y <- fromList [0,1]
| return (x+y)
|
> s
fromList [1,2,3]
Something similar in []
would give [1,2,2,3]
but sets don’t have duplicate elements so one of the 2
values has disappeared here.
How would that extra 2
be eliminated? The monad implementation would need some way to compare values: for example the Eq
or Ord
typeclasses, giving ==
or compare
. And luckily in the above case, Int
has those type classes.
But what about this:
s :: Set (Int -> Int)
s = do
x <- fromList [1,2,3]
return (const 7 :: Int -> Int)
If we did this with lists, we’d get out a 3 element list of functions: [const 7, const 7, const 7] :: [Int -> Int]
Those three elements are “obviously” the same. But, why is “obviously” in quotes? Because it’s “obvious” to you in this case, but not to Haskell in general. There’s no way to compare functions in Haskell — it’s not even clear what it means for them to be equal. (Try typing nub ([const 7, const 7, const 7]) :: [Int -> Int]
into ghci to see what happens)
Maybe we could say that you can’t return
a function (or rather, can only return values with the appropriate Eq/Ord
type classes). But the rules of Haskell monads say I’m allowed to return
a value of any type I want — monad instances don’t get to pick a new type signature for return
:
return :: Monad m => a -> m a
So this doesn’t look like it’s going to work.
Oh dear.
How we can have a Set monad
All is not lost, it turns out. The set-monad
package provides a Set monad despite the above.
The technique is to make a new data structure (confusingly also called Set
in the library but I’m going to call SetI
for Set Instruction) that describes how to build a Set
and an interpreter that can run that description to actually build a set. The data structure is an instance of Monad
but we can put whatever constraints we want on the interpreter. Although we can write a description of how to build a set of functions, we can’t actually evaluate it into an actual Haskell Set
.
Here’s a (mangled) subset of the data structure used by set-monad
:
import qualified Data.Set as S
data SetI a where
Return :: a -> SetI a
Bind :: SetI a -> (a -> SetI b) -> SetI b
Prim :: (Ord a) => S.Set a -> SetI a
We’ve got two constructors that describe doing regular monad things: Return
describes how to build a Set
from a single unconstrained value and Bind
describes how to take an existing set of instructions and bind on another step, also with no constraints. >>=
and return
are translated directly into these constructors.
The third constructor, Prim
, is specific to the set monad: it describes how to build a set from an existing Set
, but only if the contained value type behaves right. We’re allowed to put whatever constraints we want on the Prim
instruction, because it isn’t part of the Monad
interface — so we can insist on Ord
here.
If we write a monadic expression, we end up with a SetI
structure, which pretty much is just the original expression but encoded as into a data structure. Nothing set-ful has happened yet.
Next we need an interpreter that knows how to run a SetI
program and return a concrete Set
. As this is also not part of the monad interface, we can have whatever type signature we want, and so we can make it clear that we can only evaluate programs that will return sets of Ord
erable elements.
run :: (Ord a) => SetI a -> Set a
run
is implemented on a case-by-case basis:
If the instruction is “here is a set”, then that’s the set we evaluate to. The Ord
constraint is already present on Prim
so there is no problem here.
run (Prim s) = s
If the instruction is “return
this single value as a set”, then we turn that single value into a set of one element:
run (Return a) = S.singleton a
Interpreting bind is a little bit more complicated. There’s a different case for each of the left hand parameters. These different cases make up for the fact that we can’t construct sets of arbitrary values.
If we’re binding a primitive set to the next action, then we can do the “obvious” bind, quite like list: run the rest of the program f
for every element of the left-hand set, and merge the results with union
.
union
requires an Ord
constraint, but we have that because run
can only be invoked on programs that have that constraint in their return type.
run (Bind (Prim s) f) = S.foldl' S.union S.empty (S.map (run . f) s)
If we’re binding a single return
ed value, there’s no mapping/union to happen: we simply run the rest of the program on that singleton value. We don’t need to create an intermediate single valued set to map over.
run (Bind (Return a) f) = run (f a)
and finally, if we’re binding a bind, we re-arrange the brackets a bit and recursively evaluate the rearranged form.
run (Bind (Bind ma f) g) = run (Bind ma (\a -> Bind (f a) g))
Note that in these above cases we’ve not actually used the fact that Prim
has an Ord
instance! It looks like we could have merged the Prim
and Return
constructors, and just relied on the Ord
constraint on run
.
Where that Ord
instances does get used is in some cases I haven’t mentioned. set-monad
is also an instance of MonadPlus
which gives something like ++
for lists. This is implemented as additional constructors Plus
and Zero
and a bunch more cases in run
. The set-monad
source code is pretty straightforward to read to see all those cases.
This style of taking Haskell monad syntax into an intermediate data structure and then interpreting that data structure turns out to be more generally useful, and I’ll talk more about that in a few days.
See Also
set-monad on hackage
Data.Set in the containers package