How to `traverse’ sequentially
Using free monads to control the execution of asynchronous tasks in Scala
This blog post will assume terminology from the functional scala library Cats. It is easily applicable to Scalaz, too.
Libraries like Cats and Scalaz give you a verb ‘traverse’, for easier handling of combinations of wrapper types.
It allows you to turn something clunky like this:
That is, it does the
map and the
sequence for you. It applies the function to the collection, and pulls the wrapper type through to the outside.
Of course it’s not limited to
Seq. The ‘collection’ type can be anything that is “traversable” (lists, options, matrices, tuples, trys — it’s a supertype of foldable). The inner type can be any “applicative” (future/task/IO, matrices, validated/either/disjunction, etc).
A problem arises however when your ‘collection’ is large, and your function is resource intensive. For example connecting to a database, performing a query, and disconnecting. If your collection had 100,000 elements in it, this would hammer the DB into submission and almost certainly throw an exception somewhere as every single element would try to connect at the same time (up to various pooling limitations).
The problem boils down to management of side-effects. If we know the side-effect is resource intensive, and we know the collection might be large, we can predict this problem. Ideally we’d like to be able to swap out
traverse with a very similar method that guarantees the elements will be executed sequentially rather than in parallel. This would allow our fragile external services a much easier time of it.
Via batching, it would also allow us to manage concurrent access and keep it reasonably fast.
We could call this method
sequentialTraverse. It would have (nearly) the same signature as
traverse, because all it’s doing is executing side effects in a different way. It would do exactly the same thing to the types at compile time.
I say it would have “nearly” the same signature as
traverse, because what we want to do — ensure sequentiality — is inherently a property of a
Monad. Our ‘inner’ type will have this stricter condition of sequentiality and so must be a
Monad rather than an
Applicative. If there is no way to enforce the sequentiality of a series of statements (ie an asynchronous
Applicative with no
Monad instance) then there certainly is no way to sequentially traverse a structure with it.
Well, we can jump right in and have something that works but is pretty naive:
We’ve dived right in and just implemented a traverse that must be sequential, because we use
flatMap inside the sequence’s
This works, but obviously is pretty weak since it works only on
Future. It wouldn’t work on
IO, or even a different collection like
Matrix. Ideally, we would want to be able to use
sequentialTraverse in the case where we were working off an arbitrary monad rather than a fixed
Future, just in case it might be an asynchronous monad.
Slightly less naive approach
I mentioned above that
traverse works on any collection which has a
Traverse instance and any wrapper type that has an
Applicative instance. But we apparently can’t use the
traverse method, since there’s no guarantee it’s sequential. So how would we generalise the above naive attempt?
We fold the sequence: this means we need a
Foldable. We flatMap the future: this means we need a
Monad. And lastly we add an element to the sequence: this means we need a
This is almost verbatim just a generalisation of the naive approach.
And it works! The
flatMap ensures sequentiality.
However, there’s a problem.
When we break down a collection using
Foldable, there’s no guarantee we do it in the same way that the
Monoid builds it back up again.
For example, the normal
Seq.foldLeft, and the normal
:+ (append). This would work, since they map onto our original attempt exactly.
But if we had a custom
Monoid[Seq], which built a sequence using prepend rather than append, then our resulting traversed sequence would be in reverse order!
So this approach is no good either, there’s no guarantee it will work in all cases. There’s no connection at compile time between the way we fold a structure and the way we build the same structure back up.
A different approach
We need something that guarantees the breaking-down and building-up are inverses of each other.
Unfortunately, this is the
Traverse typeclass itself. We seem to have come full circle; in order to
traverse predictably in all situations we need to use
Traverse. This appeared to be a no-go because it would greedily execute our asynchronous monads. But all is not lost yet — let’s have a look at the components we have:
We know from our above explorations that
traverse has to stay. So what else has changed? The only thing different from the basic
traverse use-case is our
M is a monad, not an applicative.
Doesn’t a monad encode sequentiality? Why can’t we just do
s.traverse(f)? Won’t this be sequential because
M is a monad?
Unfortunately not, because its underlying
Applicative instance (all monads are applicatives) could still be parallel; there’s no guarantee it uses
flatMap and therefore be sequential.
You could by all means override the underlying
Applicative methods to force them to be sequential, but that is not a good user experience. Picking a
Monad and hoping it’s correct or risk having your code explode. We would rather force the code to work the way we need it to with no choice to be made.
So the only thing we can do is ensure that when we
s, we ‘trick’ it into not actually executing anything — ie make
traverse lazy -
and then to somehow execute the result ourselves in a way that ensures sequentiality. That way, the default behaviour of traverse is fine and does not interfere with any potentially external resources.
s gets broken down by
traverse, our new lazy construct, whatever it is, would get accumulated — and then we would take our construct apart in a sequential way which we manage ourselves.
This new object would be a series of lazy instructions built by traverse which
we execute afterwards.
This sounds like a free monad, which is from a certain point of view a way of encoding pure programming instructions free from any execution context. Once you have a free monad, you can execute it with any context you like.
This sounds like exactly what we need.
What is a free monad? Loosely, it’s a way of turning a
Functor into a
Monad, by ignoring nesting during the
flatMap operation. The specifics aren’t too important: What we need is that we can get a
Monad from a
Functor (for free!)
So let’s quickly recap what we’re working with:
One thing immediately jumps out as being a candidate for a free monad:
M. It’s already a
Monad, but that means it’s also a
Functor. But there’s actually another option:
f, our function we wish to traverse with.
Function1[X, Y] is a
Functor in the second argument. Could we use this? Nearly. Remember we want this to be lazy once we apply it, and a function is anything but. As soon as it’s applied, it’s… well, applied.
Let’s invent something quickly:
This is a functor in
X — it just delegates to
Function1s instance and keeps
a fixed. This type allows us to combine a function with its argument and not execute immediately. Sounds very promising.
Notice that I’ve included an
M in the definition. This is purely because we are always working with a function that produces a monad in this blog post.
There’s nothing special about the
M here at the moment. The class will be defined inside a context where we have
M so it needn’t be on the class definition.
Now let’s create a free monad from
FreeLazyFunction is an alias to our
Free type, to make the type argument lists a little nicer.
lift elevates a function to our
And… that’s all we need.
FreeLazyFunction is a
Monad already. So let’s use it:
and we have it! We have built our series of instructions using
traverse, and we haven’t yet executed anything.
We just need a way to collapse our free monad now. This is done via a mapping between our inner
Free type and a
in this case
M. This mapping is called a natural transformation and looks like this:
It’s just finally applying the
LazyFunction, and executing the underlying function/argument pairs. Now we can run our free monad:
foldMap guarantees that each step of our free monad is only executed (using
LazyFunction.apply) once the previous step is finished, using
This is precisely what we wanted, forced sequential execution inside traverse with no extra overhead for the end developer!
The only thing left is the syntax:
You can have a quick test that this works as expected:
sequentialTraverse is minimal — it requires only a
Traverse and a
Monad instance, and is usable without any extra cognitive overhead or effort from future developers. It ensures sequentiality in the case of an asynchronous monad, and behaves the same as normal
traverse in the case where we have a synchronous monad, making it safe for a developer to accidentally misuse, too.