The dangers of returning void — A look at information loss

“low light photography of stacked luggage” by Belinda Fewings on Unsplash

If you’ve programmed in virtually any mainstream language you might not have considered that there was anything wrong with returning void. I know I didn’t see what the issue was for a long time. In this post I want to hopefully show why returning void is almost always the wrong choice and how we can structure our programs to take advantage of the additional information this approach provides.

Some motivation

A tip came across my twitter feed suggesting that if you don’t need to return anything while iterating over an array you could use forEach instead of map.

My first impression was that I loved that the forEach prototype method was used rather than the for looping structure (also I’m one of the poster’s top fans). But then I started thinking about what it meant to have a function that “didn’t need to return anything”…

What’s in a function that returns void?

Think of every function/method/procedure you’ve written that returns void. What do they all have in common? They could have mutated state, inserted a row into a database, written a file to disk, or a thousand other things, but all of them performed some sort of side effect. How do I know this about your code that I’ve never seen? Because there’s no purpose calling a function that returns nothing unless it performs some side effect.

If a “function” that returns void must perform a side effect then it isn’t really a function to begin with because a function must be deterministic.

Void doesn’t compose

In the above image from my LambdaConf 2018 presentation on Abstract Algebra, we can use puzzle pieces as a metaphor and say that A, W, X, Y, and Z all return void. Once the A is placed, we can’t add anything else to the puzzle on that branch. None of the Lego bricks on the right suffer from the void issue and so the ways in which they can be combined are limitless.

Information Loss

Let’s look at our motivating example again and really look at what we’re doing.

What happens to all of that data from fs.writeFile? What if one of the writes fails? What if we wanted to wait for all of those files to be written before we do something else? Now how about if we wanted to write all of those files in parallel, or sequence? None of those options exist when we drop the information returned from fs.writeFile, we’re at the mercy of the interpreter to decide what happens.

Typing that data

The first step to keeping the information is to create a type that will contain all of the information. with fs.writeFile we have a couple different effects happening.

  1. We have asynchronous processing because the second argument is a callback. We can easily model that with a Promise<A>.
  2. We have the concept of success and failure because the callback has both Error and Success arguments. Either<L, R> is a common type to handle potential failure cases. In this definition, the L is the type that we want to use to signify an Error and the R is the type we’d like to use to signify Success.

By combining these two types into Promise<Either<L, R>> we’re able to keep all of the information from fs.writeFile into a single, generic type. In fact, almost all of the Node callback style functions can be modeled with this type which we’ll see in just a bit.

Also, while it might be tempting to use the catch from Promise<A> to handle the failure, we wouldn’t have type safety because Promise<A> doesn’t allow us to define a type for the error; it only allows us to type the “success” case. Nesting an Either<L, R> allows us to provide 2 types to maintain type safety. In this way Promise is similar to a functor (it can generate 1 distinct type) while Either is both a functor and a bifunctor (it can generate 2 distinct types).

Pinpointing the void

So when we look at writeFile and think it returns void let’s pinpoint where in our new type that void exists. If we think about it, it’s really the type that is returned after the asynchronous effect if the effect was successful which makes it the R in Promise<Either<L, R>>. Filling in our generics with our actual types (doesn’t this look just like passing values to a function?) we get our concrete type of Promise<Either<NodeJS.ErrnoException, void>>.

Leveraging fp-ts

It is entirely possible to write our own implementation of Either but that’s for another post. Also the amazing fp-ts library already has this exact type created for us as a single type rather than a nested one! Let’s see how we can use it to begin preserving information.

Thanks to the taskify function from fp-ts we can convert a void returning function with a void returning callback into a TaskEither<NodeJS.ErrnoException, void>, no data loss there!

A journey through type simplification

Now lets see how we can preserve our information by using some well known functions to adjust our types to solve the initial need of writing multiple files to disk without losing information. This will take several refactoring steps, but hang in there because the result is far less code than it may seem as you read through.

First we define an interface and a list of sample files and then we map over the files, turning each into a TaskEither. That leaves us with an array of asynchronous processes though and we’d probably prefer to have an asynchronous process of an array (TaskEither<Array<A>> rather than Array<TaskEither<A>>). Once again the fp-ts library has us covered though.

Changing the sequence of our types

The sequence function swaps the order of our types for us which is exactly what we wanted to do. Now we have a computation that will run all of the tasks in parallel and wrap them all inside a single TaskEither. If you’ve used Promise.all then you’ve already seen a very specific version of sequence that only works for Promise values in an Array.

Map + Sequence = Traverse

It turns out that the combination of calling map and then sequence is very common and is called traverse. Switching to traverse will allow us to simplify our 2 function calls into 1 to produce the same output.

And there you have it. We have now maintained all of our information until the very last bit of our processing (lines 9 and 10). Also, look at how we are now forced to deal with the exception case. There’s no way to get the resulting value in line 10 without first providing the failure function in line 9.

Wrapping up

Because we went through several refactoring steps to get to this point you may think that this approach has created tons more code, but here’s the code in its entirety to prove that we stayed very lean despite this code including lots of additional code that wasn’t there in the initial screenshot.d

Our basic steps were to define a type that retained all of the information and then reorganize our nested types until they were in the order we wanted to process them.

Additional reading