Hack: How to Use Scala Futures with Cats IO

Tamas Polgar
4 min readJan 30, 2020

--

It’s a common scenario to use plain Scala Futures with IO monads, such as Cats Effect’s IO, mainly as part of integrating “legacy” libraries with pure functional programming libraries. This article demonstrates a common trap you might fall into during the integration process.

At School, France in the Year 2000 retro-futuristic postcard series from 1899. (source: Jean Marc Cote on Wikimedia Commons)

Why would you use Cats IO instead of Scala Futures?

Main benefits of using IO:

  • IO is pure, immutable and referentially transparent.
  • It makes it easy to control the execution of the effects.
  • It’s currently the most common solution to reflect the pure and impure parts of the program.
  • Fits in the Cats functional programming ecosystem.
  • A value of type IO is a computation that, when evaluated, can perform effects before returning a value.

I’d recommend the following in-depth discussion about the benefits of IO involving Martin Odersky on Reddit (link).

This article focuses on using Cats-Effects IO, it can be applied to other IO libraries as well such as Monix.

Bad example

It’s easy to believe if a method returns an IO it would be pure without any side-effect, at least that’s what we would expect from it.

Let’s use the following method returning a Scala Future:

This method has a side effect; it prints to the standard output and retrieves the current time.

This method could perform a DB operation, a REST call; the point is that it’s a method returning a side effect wrapped in a Future; something older / legacy database or REST libraries would return.

And we wrap it into the IO and to highlight the problem let’s add 2 seconds sleep:

The method returns an IO, so we should be done. We can use it and it works seemingly fine:

It also seems to run as we’d expect:

Future function for 1 evaluated at 16:17:11.975
Function returned Customer_1, at 16:17:14.001

Now let’s use our new function for a set of customer ids sequentially:

Even this seems to run fine and result what we’d expect:

Function returned List(Customer_1, Customer_2, Customer_3, Customer_4, Customer_5), at 16:20:22.882

However, if we check the timings of the actual getCustomerByIdFuture calls we can see they were executed at the same time in parallel, not sequentially as we expected:

Future function for 1 evaluated at 16:20:12.789
Future function for 2 evaluated at 16:20:12.807
Future function for 3 evaluated at 16:20:12.807
Future function for 4 evaluated at 16:20:12.807
Future function for 5 evaluated at 16:20:12.807

It can run on production without anybody noticing it; maybe even forever. But it can also result in hammering the database or some REST service, as we don’t control the side effects.

What is happening?

The problem comes from the eager evaluation of the Future.apply method; it starts an asynchronous computation and returns a Future instance with the result of that computation.

In the badGetCustomerByIdIo the getCustomerByIdFuture function gets evaluated and that results in creating a Future instance and that starts the asynchronous computation at that moment. When we create the IO the computation is already running in the background, and it’s not contained in the IO waiting for execution.

So when we map through the list of ids in the run function, we create and start a new Future for all of them at the same time.

Good example

The fix is very easy, we have to delay the evaluation of the function that creates the Future:

Because the IO.apply evaluates the argument lazily the future won’t get created until the IO is executed. This results in the execution as we expected:

Future function for 1 evaluated at 17:51:18.495
Future function for 2 evaluated at 17:51:20.515
Future function for 3 evaluated at 17:51:22.520
Future function for 4 evaluated at 17:51:24.525
Future function for 5 evaluated at 17:51:26.529
Function returned List(Customer_1, Customer_2, Customer_3, Customer_4, Customer_5), at 17:51:28.535

Summary

This mistake can cause nasty issues that take ages to investigate. One of the benefits of using IO is to control the effects, with the bad implementation we lose that ability.

It’s essential to make sure the conversion between Future and IO are done well. Unit testing the timing of the effects in IO or Future is not something application developers usually do. If you need to work on such a code make sure you at least perform some manual testing to make sure your code is working as expected.

It’s beneficial to use libraries that are built in the cats/cats-effect ecosystem and it would eliminate/reduce the need of using Future through IO.

My advice is to limit the usage of IO to a small set of layers where it’s necessary, and use plain pure functions everywhere else.

Thanks for reading, your comments are appreciated.

One last thing…

If you liked this article, please click the 👏 so other people will see it here on Medium. Thank you.

--

--