Typesafe Union-type folding in Scala
Escaping Union type contexts safely
Goal and ideal syntax
In a previous blog post I introduced a way to get unboxed arbitrary arity union types in scala. We can write
foo allows an
Int or a
String to be passed to it, and nothing else.
But escaping the union context, ie getting the
Int or the
String out again from our generic
T, wasn’t safe. We had to do this:
The above is not checked by the compiler, either for exhaustivity (if we’d missed
Int it wouldn’t tell us) nor precision (if we included an extra type we didn’t need it wouldn’t tell us).
But it is certainly the easiest way to naively extract our original value again.
The goal of this article is to be able to "
match” a union type in a typesafe manner. We’ll start from an ideal syntax, something equivalent to a normal pattern match, and work outwards from there. I warn you there are some immoral steps along the way — this is more of an experiment in scala than something to be used in production.
- compile-time exhaustivity matching
- order of types should not matter
- precision matching (making sure that if we supply an unneeded case the compiler tells us)
At the moment our syntax is
This is our ideal look — pretty minimal. We’re going to try and build something minimal taking this as inspiration, and which is also safe.
Our general union type is arbitrarily wide, so any construct we write to generically escape a union context must also be arbitrarily wide.
This means each case must be collapsed iteratively into an object we are constructing as we go. You can’t have a pattern match of arbitrary length after all. It’s going to look a bit like the builder pattern.
This also means that we need to call an additional
run method on this object to actually execute the instruction — we need a separation between constructing our instruction and executing it. This is similar to how one builds and executes free monads, or the builder pattern.
So the minimal syntax we can hope for looks a little bit like this:
It looks like a
match turned into a builder.
.match to initialise,
.at to add a case, and
.run to signal to the compiler that we’re done building it and return our value.
This is now looking a lot like a fold.
This thing we’re building with the
.at method is like a polymorphic function, a function from many types to one type. Shapeless has them, and uses them to fold over their
Coproducts. So we’re probably on the right track.
Ours is different and slightly more difficult to deal with (not that shapeless
Poly is easy) because we are unboxed — we have to handle a completely free
T, constrained only by convention based on what type classes are in scope. I’m going to call it
VirtualPolyFunction since it will act on a ‘virtual’
Union. This is my own nomenclature, apologies if ‘virtual’ means something else to you.
.at step, we add a new type to the
VirtualPolyFunction we’re building which we have to keep track of. This means the structure of the class must either:
- be parameterised by a
- be arbitrarily nested, like a free monad
I chose to parameterise it by a
Union, since that’s what we’re dealing with.
Free might give an avenue for avoiding some of the immorality inside our eventual class, but that’s something I have not yet investigated.
Further requirements for the structure of
- it must at some point reference our
unionvalue which we’re folding over, so must be parameterised by some completely free generic type
- It must be parameterised by the return type.
- It must also record within it the actual function we’re iteratively building up, from our free domain
Tto our return type.
- It must have two methods: One to add a new case to the arbitrary-type-arity virtual polymorphic function, and one to execute the function when given a value.
The minimal possible structure so far is thus:
A quick run-down:
VPF) is understood to act on — it needn’t yet be connected to
Tin any way. This will come later
Tis the type of the value our VPF will act on, the union value we want to escape
Returnis the eventual return type
withCaseadds a new case to the VPF — note the new
withCases return type — we’ve added our new type to the head.
applyis trivial, it just runs the function on a value
funcis some function from a generic
Return. This is the thing we’re trying to make safe (ie, our original naive pattern match).
withCase is the only thing we need to actually implement.
withCase is unfortunately not nice to implement. Our current way of escaping union types
relies on run-time matching. The compiler’s interpretation in this context of the type of
union: T is plain old java
Object. It takes each of these matches, sees if
union: Object satisfies any of the cases and if not throws a
Even though the compiler has lost information about
T due to erasure, it still knows that not all
Ints, so it encodes instructions to fall through the case match to the subsequent cases.
withCase[X] we will be matching against a generic type
X, the erased type of which is also
Object. Thus when we write
the compiler reads
To the compiler this is an exhaustive match and we can not add more cases to this. At run-time, if
T is not
X, we get a
ClassCastException — erasure has removed the information we needed.
Getting around this is the immoral bit. The implementation of
- Take a value of type
- See if it is of type
X via unsafe runtime casting
- If it is (ie no exception), apply our new function that
withCase has just supplied
- If it is not (ie
ClassCastException thrown) apply our previous function
In this way our final function right at the end recursively tries to type-cast our
t to each type in
U, catching an exception each time and falling back to the next cast if needed.
Short of using reflection, I think this is the closest analogy to an arbitrary-length pattern match on a generic there is.
VirtualPolyFunction isn’t quite enough. The first problem is that it takes a value, where in our ideal syntax we chain methods on a value. The second is that, though each step of
withCase is in theory perfectly safe given at least one previous step is safe, there’s no guarantee yet that we have at least one safe step.
In other words, we could construct a value with type
VirtualPolyFunction[Int :|: UNil, String, Any], which
obviously could not work. We need
String to be in the union type but it isn’t. So we need a wrapper layer to make it safe.
This wrapper layer has the same two method requirements as
VPF. Therefore also has the same structural requirements as
VPF, and must carry the union value around inside it to let us chain methods together nicely. The minimal structure is therefore:
Returnare still here
value: Tis the union value we’re escaping
runis where the safety gets injected; before we can execute this
UnionFolder, we need evidence that
This last point is key. We can build up any VPF /
UnionFolder we like, but we can’t actually run it on a
T until we have evidence that the union type upon which the VPF acts actually contains
So the following would be unable to be run:
but the following would be fine:
The implicit that
run takes puts back into our construct what the compiler took with erasure: knowledge that our pattern match is safe.
The very last thing to do is construct the initial syntax. We have a way of building up and executing this object on a value, but we don’t have a way yet of elegantly attaching it to a value. I chose the name
foldUnion instead of
fold to reduce confusion and name collision.
Because our union types are completely free, and to the compiler they are just type
T, we must define this method on
Note the signature of
Y are set by the function we give it, but
U is free, the compiler chooses what
U is based on what union evidences we find in scope. The one requirement we have on this method is
tUnion: that the value we’re calling this on must belong to some union (this condition is trivially satisfied by any type, eg
Int :|: UNil); but if the compiler finds one passed by a parent method, it prefers that thanks to implicit resolution order
tUnion is trivial, if the compiler finds no union evidence it will construct a satisfactory one itself.
That’s it! The final product looks like this:
Looks pretty good to me.
- If we missed a case (ie commented out the
Charline) it would not compile
- The types we chain together need not be in the same order as the union type in the method
We can add in unnecessary cases (ie we could add a
Boolean case in the above example) — we could make this throw a compilation error but it would take longer to compile.
We would do this by allowing a free union type parameterisation in our
UnionFolder.run method, and insisting that whichever one the compiler picks from our
OneOf is the same as VPF’s
U parameter — this approach would satisfy the precision matching ‘extra credit’ mentioned above.
As I mentioned above, this was more of a thought experiment to see how far I could push my union type idea. The exception catching in the middle of VPF obviously has overheads you may wish to consider before using this approach.
First: These are total functions not partial functions. If we supplied a case match with a guard it would fail instead of gracefully falling to the subsequent case if the guard failed.
Second: It is available on any type at all thanks to
RichTea. The following compiles.
Int is isomorphic to
Int :|: UNil so you could argue it’s doing the right thing by allowing this, if you wanted to. But the following would not compile:
because we haven’t provided an