Visitors - a tale of generalization

Marek Żarnowski
Sep 13 · 6 min read

A visitor is often considered an ugly cousin of pattern matching. It’s looked at with disdain and considered highly boilerplate (and rightly so!). Why would anyone stoop to using it? In the end, everything you can visit I can pattern match on, right?

And yet that is exactly what we used when working on a PoC of the Tasty Serializer for both Scalac and Dotty ASTs. Not only did it save us a lot of wasted time and duplicated effort, but also exposed us to a design which allows to operate on various complex structures in a uniform way.

Since both Scalac and Tasty ASTs are quite complex, let’s focus on a much smaller problem using exactly the same steps which have proven successful for the PoC.

Our task is to serialize two different tree structures using the same format. Additionally:

1. Both tree structures are representing a well-known (and, hence, simple) arithmetic expression
2. The output format is somewhat LISPy: (+ 3 2 (- 1 5))
3. One of these tree representations will be very similar to the output format. This is both incidental, since it will probably not happen very often in real life, and done on purpose to keep the example simple.

Input domains

trait BinaryTree
object BinaryTree{
case class Node(
left: BinaryTree,
operator: Operator,
right:BinaryTree)
extends BinaryTree
case class Leaf(value: Int) extends BinaryTree
}
trait FlatTree
object FlatTree{
case class Node(
operator: Operator,
operands: Seq[FlatTree])
extends FlatTree
case class Leaf(value: Int) extends FlatTree
}

First attempt

Let’s start by writing a straightforward implementation of serializers.

class BinaryTreeSerializer(output: Writer) {
def serialize(tree: BinaryTree): Unit = tree match {
case BinaryTree.Leaf(value) =>
output.write(value)
case BinaryTree.Node(left, op, right) =>
output.write("(")
output.write(op)
output.write(" ")
serialize(left)
output.write(" ")
serialize(right)
output.write(")")
}
}
class FlatTreeSerializer(output: Writer) {
def serialize(tree: FlatTree): Unit = tree match {
case FlatTree.Leaf(value) =>
output.write(value)
case FlatTree.Node(op, operands) =>
output.write("(")
output.write(op)
operands.foreach { operand =>
output.write(" ")
serialize(operand)
}
output.write(")")
}
}

We can immediately notice that the code is not good enough — we have a lot of unnecessary repetitions for such a simple case. A visitor, in the traditional meaning, is not a solution. Here, we are not duplicating how we traverse the structure, but rather what we are doing to produce the desired output.

We can try and abstract over the common parts.

A common interface

class LISPySerializer(output: Writer) {
type Operand = ???
def serialize(operand: Operand) = ??? def serializeValue(value: Int): Unit =
writer.write(value)
def serializeExpression(
operator: Operator,
operands: Seq[Operand]): Unit = {
output.write("(")
output.write(operator)
operands.foreach(operand => {
output.write(" ")
serialize(operand)
})
output.write(")")
}
}

Now we have captured the essence of what this serializer is capable of, but we are still missing one important piece of the puzzle: what is the Operand?

Those of you who immediately thought “you could have just created another tree structure — a LISPyTree” are quite right. It would be a correct and simple solution:

trait LISPyTree
object LISPyTree{
case class Node(
operator: Operator,
operands: Seq[LISPyTree])
extends LISPyTree

case class Leaf(value: Int) extends LISPyTree
}
class LISPyTreeSerializer(output: Writer) {
type Operand = LISPyTree
def serialize(operand: Operand): Unit = operand match {
case LISPyTree.Leaf(value) =>
serializeValue(value)
case LISPyTree.Node(op, operands) =>
serializeExpression(op, operands)
}

// serializeValue and serializeExpression remain the same
}

Now, all that is left to do is to convert both BinaryTree and FlatTree to the LISPyTree and then we are done.

A hidden cost

As it turns out, our client is quite concerned about the execution time and memory usage. We cannot afford to convert one tree to another just to serialize it. If we were going to reuse the LISPyTree structure later, then maybe this would be a sensible solution. But as things are, this is not good enough. We are processing a lot of tree nodes while our users are waiting and this approach makes a noticeable difference.

Since now we cannot use an intermediate structure, we need to find another solution. Because the LISPySerializer fully captures the serialization of LISP-like trees it makes sense to stick to it.

The missing piece in doing that is the Operand type. Since we cannot create new structures, let’s reuse the types we already have by parameterizing the serializer.

abstract class LISPySerializer[Operand](output: Writer) {
def serialize(operand: Operand): Unit

final def serializeValue(value: Int): Unit =
writer.write(value)

final def serializeExpression(
operator: Operator,
operands: Seq[Operand]): Unit = {
output.write("(")
output.write(operator)
operands.foreach { operand =>
output.write(" ")
serialize(operand)
}
output.write(")")
}
}

Notice, how the parameters of methods correspond to the `LISPyTree` hierarchy we almost introduced. That is what I call a virtual representation of a domain. It contrasts a concrete representation which is just a type hierarchy.

Since the LISPySerializer doesn’t know anything about the Operand, let’s delegate the serialize methods to the implementations:

final class BinarySerializer(output: Writer)
extends LISPySerializer[BinaryTree](output){

def serialize(operand: BinaryTree): Unit = operand match {
case BinaryTree.Leaf(value) =>
serializeValue(value)
case BinaryTree.Node(left, op, right) =>
val operands = Seq(left, right)
serializeExpression(op, operands)
}
}
final class FlatSerializer(output: Writer)
extends LISPySerializer[FlatTree](output){
def serialize(operand: FlatTree): Unit = operand match {
case FlatTree.Leaf(value) =>
serializeValue(value)
case FlatTree.Node(op, operands) =>
serializeExpression(op, operands)
}
}

Concrete serializers have only one task: to implement dispatching of the operand to the methods provided by the `LISPySerializer`. Now we have much simpler implementations than the ones we started with. There is no duplicated code and the implementations can focus on providing the best mapping possible. For example, the BinarySerializer can be enhanced to produce more compact trees : (+ 1 2 3) instead of (+ 1 (+ 2 3)).

Another input domain

Just for fun, let’s consider another small task: we also want to factorize and serialize a number. This means that the input domain is an integer while the output domain remains a LISPyTree. Our actions are guided by the method signatures in the LISPySerializer, which makes the solution straightforward:

final class FactorizedNumberSerializer(output: Writer)
extends LISPySerializer[Int](output){
def serialize(operand: Int): Unit = {
val operands: Seq[Int] = factorize(operand)
operands match {
case Seq(value) =>
serializeValue(value)
case _ =>
serializeExpression(Operands.Multiply, operands)
}
}

def factorize(number: Int) = ??? // let’s leave it as an exercise for the reader
}

It is clear by now that the input domain is not really that important — as long as we can somehow fit it to the methods exposed by the LISPySerializer.

Generalized Visitor

Typical visitor is used to traverse a type hierarchy A — dispatching subtypes to the accept methods matching their signature.

class OptionVisitor[A, B]{
def dispatch(value: Option[A]): B =
value match {
case Some(v) => acceptSome(v)
case None => acceptNone()
}
def acceptSome(value: A): B
def acceptNone(): B
}

We can generalize it by extracting only the accept methods:

trait OptionVisitor[A, B]{
def acceptSome(value: A): B
def acceptNone(): B
}

Now the whole new world opens up to us! By providing a simple mapping to the virtual domain of the Option type, we can handle various new domains.

trait FirstElementVisitor[A, B] extends OptionVisitor[A, B]{
def dispatch(seq: Seq[A]): B =
seq match{
case Seq() =>
acceptNone()
case _ =>
acceptSome(seq.head)
}
}
trait LeftmostNodeVisitor[A, B] extends OptionVisitor[A, B]{
def dispatch(tree: BinaryTree[A]): B =
tree match {
case Node(Leaf, _, value) => acceptSome(value)
case Node(left, _, _) => dispatch(left)
case Leaf => acceptNone()
}
}
trait OptionToTry[A] extends OptionVisitor[A, Try[A]]{
def acceptNone(): Try[A] =
Failure(new NoSuchElementException)
def acceptSome(value: A) =
Success(value)
}

All those three can be mixed, to obtain mappings:

  • FirstElementToTry
  • LeftNodeToTry

Why should you bother, since those can be replaced by a pattern matching?
You probably shouldn’t. At least, if the structures are trivial or seldom used. But when the complexity or pervasiveness grows, a more systematic approach may be helpful to contain the entropy.

What can be gained in the end?

  • Performance — there is no need for creating intermediate objects
  • Coherence — new implementations have only one task: either map the input to the virtual domain or map the virtual domain to the output domain

Tasty context

When working on the Tasty serializer, all we knew at the beginning was that:
- we have two very complex type hierarchies as inputs;
- we have a semi-complex hierarchical domain of Tasty;
- we do not want to convert those ASTs to Tasty just for it to be serialized — the compiler is slow enough as it is.

Because we wanted to neither duplicate the work across the Scalac and Dotty serializers nor couple them with the specifics of the Tasty format, which is not yet fully stable, we had to somehow capture Tasty as our `virtual domain`. Using the technique presented here, we were able to contain the complexity and instability of the Tasty in one place while working on converting the trees in another place.

VirtusLab

Virtus Lab company blog

Thanks to Michał Pociecha and Hubert Słojewski

Marek Żarnowski

Written by

VirtusLab

VirtusLab

Virtus Lab company blog

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade