Typesafe Union-type folding in Scala
Escaping Union type contexts safely
Goal and ideal syntax
In a previous blog post I introduced a way to get unboxed arbitrary arity union types in scala. We can write
foo
allows an Int
or a String
to be passed to it, and nothing else.
But escaping the union context, ie getting the Int
or the String
out again from our generic T
, wasn’t safe. We had to do this:
The above is not checked by the compiler, either for exhaustivity (if we’d missed Int
it wouldn’t tell us) nor precision (if we included an extra type we didn’t need it wouldn’t tell us).
But it is certainly the easiest way to naively extract our original value again.
The goal of this article is to be able to "match
” a union type in a typesafe manner. We’ll start from an ideal syntax, something equivalent to a normal pattern match, and work outwards from there. I warn you there are some immoral steps along the way — this is more of an experiment in scala than something to be used in production.
Requirements:
- compile-time exhaustivity matching
- order of types should not matter
Extra credit:
- precision matching (making sure that if we supply an unneeded case the compiler tells us)
Syntax
At the moment our syntax is
This is our ideal look — pretty minimal. We’re going to try and build something minimal taking this as inspiration, and which is also safe.
Our general union type is arbitrarily wide, so any construct we write to generically escape a union context must also be arbitrarily wide.
This means each case must be collapsed iteratively into an object we are constructing as we go. You can’t have a pattern match of arbitrary length after all. It’s going to look a bit like the builder pattern.
This also means that we need to call an additional run
method on this object to actually execute the instruction — we need a separation between constructing our instruction and executing it. This is similar to how one builds and executes free monads, or the builder pattern.
So the minimal syntax we can hope for looks a little bit like this:
It looks like a match
turned into a builder. .match
to initialise, .at
to add a case, and .run
to signal to the compiler that we’re done building it and return our value.
This is now looking a lot like a fold.
VirtualPolyFunction
This thing we’re building with the .at
method is like a polymorphic function, a function from many types to one type. Shapeless has them, and uses them to fold over their HList
s and Coproduct
s. So we’re probably on the right track.
Ours is different and slightly more difficult to deal with (not that shapeless Poly
is easy) because we are unboxed — we have to handle a completely free
type parameter T
, constrained only by convention based on what type classes are in scope. I’m going to call it VirtualPolyFunction
since it will act on a ‘virtual’ Union
. This is my own nomenclature, apologies if ‘virtual’ means something else to you.
At each .at
step, we add a new type to the VirtualPolyFunction
we’re building which we have to keep track of. This means the structure of the class must either:
- be parameterised by a
Union
orHList
- be arbitrarily nested, like a free monad
I chose to parameterise it by a Union
, since that’s what we’re dealing with. Free
might give an avenue for avoiding some of the immorality inside our eventual class, but that’s something I have not yet investigated.
Further requirements for the structure of VirtualPolyFunction
are:
- it must at some point reference our
union
value which we’re folding over, so must be parameterised by some completely free generic typeT
. - It must be parameterised by the return type.
- It must also record within it the actual function we’re iteratively building up, from our free domain
T
to our return type. - It must have two methods: One to add a new case to the arbitrary-type-arity virtual polymorphic function, and one to execute the function when given a value.
The minimal possible structure so far is thus:
A quick run-down:
U
is theUnion
ourVirtualPolyFunction
(VPF
) is understood to act on — it needn’t yet be connected toT
in any way. This will come laterT
is the type of the value our VPF will act on, the union value we want to escapeReturn
is the eventual return typewithCase
adds a new case to the VPF — note the newUnion
type inwithCase
s return type — we’ve added our new type to the head.apply
is trivial, it just runs the function on a valuefunc
is some function from a genericT
toReturn
. This is the thing we’re trying to make safe (ie, our original naive pattern match).
withCase
is the only thing we need to actually implement.
withCase
withCase
is unfortunately not nice to implement. Our current way of escaping union types
relies on run-time matching. The compiler’s interpretation in this context of the type of union: T
is plain old java Object
. It takes each of these matches, sees if union: Object
satisfies any of the cases and if not throws a MatchError
.
Even though the compiler has lost information about T
due to erasure, it still knows that not all Object
s are Int
s, so it encodes instructions to fall through the case match to the subsequent cases.
Unfortunately inside withCase[X]
we will be matching against a generic type X
, the erased type of which is also Object
. Thus when we write
the compiler reads
To the compiler this is an exhaustive match and we can not add more cases to this. At run-time, if T
is not X
, we get a ClassCastException
— erasure has removed the information we needed.
Getting around this is the immoral bit. The implementation of withCase
is
In words,
- Take a value of type T
- See if it is of type X
via unsafe runtime casting
- If it is (ie no exception), apply our new function that withCase
has just supplied
- If it is not (ie ClassCastException
thrown) apply our previous function
In this way our final function right at the end recursively tries to type-cast our t
to each type in U
, catching an exception each time and falling back to the next cast if needed.
Short of using reflection, I think this is the closest analogy to an arbitrary-length pattern match on a generic there is.
UnionFolder
VirtualPolyFunction
isn’t quite enough. The first problem is that it takes a value, where in our ideal syntax we chain methods on a value. The second is that, though each step of withCase
is in theory perfectly safe given at least one previous step is safe, there’s no guarantee yet that we have at least one safe step.
In other words, we could construct a value with type VirtualPolyFunction[Int :|: UNil, String, Any]
, which
obviously could not work. We need String
to be in the union type but it isn’t. So we need a wrapper layer to make it safe.
This wrapper layer has the same two method requirements as VPF
. Therefore also has the same structural requirements as VPF
, and must carry the union value around inside it to let us chain methods together nicely. The minimal structure is therefore:
U
,T
andReturn
are still herevalue: T
is the union value we’re escapingat
just callswithCase
run
is where the safety gets injected; before we can execute thisUnionFolder
, we need evidence thatT
belongs toU
This last point is key. We can build up any VPF / UnionFolder
we like, but we can’t actually run it on a T
until we have evidence that the union type upon which the VPF acts actually contains T
.
So the following would be unable to be run:
but the following would be fine:
The implicit that run
takes puts back into our construct what the compiler took with erasure: knowledge that our pattern match is safe.
foldUnion
The very last thing to do is construct the initial syntax. We have a way of building up and executing this object on a value, but we don’t have a way yet of elegantly attaching it to a value. I chose the name foldUnion
instead of match
or fold
to reduce confusion and name collision.
Because our union types are completely free, and to the compiler they are just type T
, we must define this method on RichT
:
Note the signature of foldUnion
: X
and Y
are set by the function we give it, but U
is free, the compiler chooses what U
is based on what union evidences we find in scope. The one requirement we have on this method is
tUnion
: that the value we’re calling this on must belong to some union (this condition is trivially satisfied by any type, egInt
andInt :|: UNil
); but if the compiler finds one passed by a parent method, it prefers that thanks to implicit resolution order
Thus tUnion
is trivial, if the compiler finds no union evidence it will construct a satisfactory one itself.
Final syntax
That’s it! The final product looks like this:
Looks pretty good to me.
Note:
- If we missed a case (ie commented out the
Char
line) it would not compile - The types we chain together need not be in the same order as the union type in the method
We can add in unnecessary cases (ie we could add a Boolean
case in the above example) — we could make this throw a compilation error but it would take longer to compile.
We would do this by allowing a free union type parameterisation in our UnionFolder.run
method, and insisting that whichever one the compiler picks from our T
’s implicit OneOf
is the same as VPF’s U
parameter — this approach would satisfy the precision matching ‘extra credit’ mentioned above.
As I mentioned above, this was more of a thought experiment to see how far I could push my union type idea. The exception catching in the middle of VPF obviously has overheads you may wish to consider before using this approach.
Quirks
First: These are total functions not partial functions. If we supplied a case match with a guard it would fail instead of gracefully falling to the subsequent case if the guard failed.
Second: It is available on any type at all thanks to RichTea
. The following compiles.
It produces "7"
. Technically, Int
is isomorphic to Int :|: UNil
so you could argue it’s doing the right thing by allowing this, if you wanted to. But the following would not compile:
because we haven’t provided an Int
case.