It’s a common scenario to use plain Scala Futures with IO monads, such as Cats Effect’s IO, mainly as part of integrating “legacy” libraries with pure functional programming libraries. This article demonstrates a common trap you might fall into during the integration process.
Why would you use Cats IO instead of Scala Futures?
Main benefits of using
IOis pure, immutable and referentially transparent.
- It makes it easy to control the execution of the effects.
- It’s currently the most common solution to reflect the pure and impure parts of the program.
- Fits in the Cats functional programming ecosystem.
- A value of type
IOis a computation that, when evaluated, can perform effects before returning a value.
I’d recommend the following in-depth discussion about the benefits of IO involving Martin Odersky on Reddit (link).
It’s easy to believe if a method returns an
IO it would be pure without any side-effect, at least that’s what we would expect from it.
Let’s use the following method returning a Scala
This method has a side effect; it prints to the standard output and retrieves the current time.
This method could perform a DB operation, a REST call; the point is that it’s a method returning a side effect wrapped in a
Future; something older / legacy database or REST libraries would return.
And we wrap it into the
IO and to highlight the problem let’s add 2 seconds sleep:
The method returns an
IO, so we should be done. We can use it and it works seemingly fine:
It also seems to run as we’d expect:
Future function for 1 evaluated at 16:17:11.975
Function returned Customer_1, at 16:17:14.001
Now let’s use our new function for a set of customer ids sequentially:
Even this seems to run fine and result what we’d expect:
Function returned List(Customer_1, Customer_2, Customer_3, Customer_4, Customer_5), at 16:20:22.882
However, if we check the timings of the actual
getCustomerByIdFuture calls we can see they were executed at the same time in parallel, not sequentially as we expected:
Future function for 1 evaluated at 16:20:12.789
Future function for 2 evaluated at 16:20:12.807
Future function for 3 evaluated at 16:20:12.807
Future function for 4 evaluated at 16:20:12.807
Future function for 5 evaluated at 16:20:12.807
It can run on production without anybody noticing it; maybe even forever. But it can also result in hammering the database or some REST service, as we don’t control the side effects.
What is happening?
The problem comes from the eager evaluation of the
Future.apply method; it starts an asynchronous computation and returns a
Future instance with the result of that computation.
getCustomerByIdFuture function gets evaluated and that results in creating a
Future instance and that starts the asynchronous computation at that moment. When we create the
IO the computation is already running in the background, and it’s not contained in the
IO waiting for execution.
So when we map through the list of ids in the
run function, we create and start a new
Future for all of them at the same time.
The fix is very easy, we have to delay the evaluation of the function that creates the
IO.apply evaluates the argument lazily the future won’t get created until the
IO is executed. This results in the execution as we expected:
Future function for 1 evaluated at 17:51:18.495
Future function for 2 evaluated at 17:51:20.515
Future function for 3 evaluated at 17:51:22.520
Future function for 4 evaluated at 17:51:24.525
Future function for 5 evaluated at 17:51:26.529Function returned List(Customer_1, Customer_2, Customer_3, Customer_4, Customer_5), at 17:51:28.535
This mistake can cause nasty issues that take ages to investigate. One of the benefits of using
IO is to control the effects, with the bad implementation we lose that ability.
It’s essential to make sure the conversion between
IO are done well. Unit testing the timing of the effects in
Future is not something application developers usually do. If you need to work on such a code make sure you at least perform some manual testing to make sure your code is working as expected.
It’s beneficial to use libraries that are built in the cats/cats-effect ecosystem and it would eliminate/reduce the need of using
My advice is to limit the usage of
IO to a small set of layers where it’s necessary, and use plain pure functions everywhere else.
Thanks for reading, your comments are appreciated.
One last thing…
If you liked this article, please click the 👏 so other people will see it here on Medium. Thank you.