Do we need effects to get abstraction?
Sandy Maguire gave an excellent talk on his latest library: polysemy. I highly encourage you to watch it because Sandy also presents other libraries: freer
, fused-effect
, and how they differ from polysemy
in a quest for expressivity, performance and boilerplate-removal.
While I am very impressed by the recent improvements in effects libraries I am still thinking “do we need any of this?”. If I scroll back to Sandy’s original motivations for using effects I hear that he was working a system which was aBig ball o' IO spaghetti
, impossible to test
. His conclusion is that we need to write programs at a high-level (he says in a domain-specific language, a DSL
) and run a series of transformations to lower-level DSLs. Thus: effects. I say: maybe not!
Levels of abstraction
I fully endorse the intention though. If we can describe our applications / services at different levels of abstraction we get fantastic abilities to understand them, evolve them, test them. And Sandy gives a great example in his presentation, which is more fully developed in one of his blog posts on freer monads:
write a program that fetches a CSV file from FTP, decrypts it, streams its contents to an external pipeline, and tracks its stats in Redis.
And the program should look like this:
This is indeed very high-level, we really get the “heart” of the application which is to repeatedly read records and process them. Then with a series of interpreters we can derive the full program which will communicate with FTP, decrypt, and so on:
Pretty cool but I am still unconvinced that we need effect libraries to do any of this.
Interface / Implementation
What we want is some way to:
- separate an abstract interface from its implementation(s)
- connect the two
This is not a very new idea because it is a the heart of many modularity efforts. Abstract Data Types are an example of that. They were pioneered by Barbara Liskov (yes from the “Liskov substitution principle”) and Stephen Zilles in 1974 and fully adopted in the ADA programming language where they enable the programmer to hide details like memory layout and management. An Abstract Data Type provides the representation of some data as a set of operations (its “interface”) while keeping an internal representation (its “implementation”) hidden. This is not very far from writing: it provides a “DSL” and an “interpreter”.
So I thought: “can we create the same application without effects?”. In particular can I use simple “records-of-functions” in Haskell +my registry
library to get to the same level of abstraction and ease of testing?
With records of functions
First surprise: I can’t do it! Indeed the Input
effect is quite peculiar. It provides a Maybe i
out of an input source. If you want to declare and implement it as a separate module you either have to:
- create a stateful module where you track the current flow of inputs. This requires some
IO
and the FP crowd might frown upon that (I don’t) - “outsource” the state management to the rest of the program. This is what is done with the effects approach where the meaning of the whole program is made (functionally) stateful when using the
csvInput
interpreter
This is an important realization. With effects you interpret full programs giving them specific meanings whereas with simple “records of functions” you “inject” a specific behaviour with only a “local” meaning. That’s why approaches like effects or finally tagless are still valuable for some situations like non-deterministic effects which are almost equivalent to full program rewrite.
Is this a blocker? I don’t think so. This capacity to “yield elements on demand” is at the core of any streaming library and Sandy’s application is fundamentally a streaming service. So having a component which, as its interface, returns a Stream a
is almost as good, if not better than returning Maybe a
. You can indeed argue that it is even better because you don’t even have to recurse in the ingest
function. You can have the following main
:
The top-level application is fairly similar to Sandy’s example and is structured as:
- a stream producer:
readInput
- a stream transformer:
saveOutputs
- a stream consumer:
saveStats
It keeps all the good properties of the original example:
- it is modular in the sense that the way we output elements is fully decoupled from the way we read them for example
- the details of how we perform the operations are fully encapsulated in each component:
input
,output
andstats
But Sandy has more challenges for us! First challenge: provide records from a FileProvider
which could either be a local file or a file coming from a FTP server. Second challenge: decrypt the file without having to modify the FileProvider
component. Third challenge: batch the outputs to reduce the number of API calls.
Abstracting over file provenance
This one is not hard. If we have a component describing the reading of inputs with the following interface:
We can provide a “constructor” which will create this component with the help of a FileProvider
Here we don’t know where the file comes from but we do the job of parsing it as a CSV file and returning a stream. The FileProvider
itself can be implemented as:
As you can see we have 2 implementations here, one for FTP, using some Ftp
capability and another one which would just read files from disk. This almost solves the first challenge: have different layers of abstraction and possibly different implementations at each level. What is missing is a way to select a specific set of implementations like Sandy does with all the runXXX
interpreters in his example. This is provided by the registry
library:
Here we have the application, App
, containing the top-level components and a registry
describing all the exact constructors needed to build the application (I will not explain here the role of the various operators like funTo
and the type applications, please head to the registry
documentation if you want to know more).
How can we run the application with aLocalFileProvider
instead? Easy, you “override” the registry with the newLocalFileProvider
constructor:
The program doesn’t change, we are just running it with a different set of implementations.
Aside on testing
This fully solves the “testing challenge” because we have an easy way to “tweak” the behaviour of the application under test. By just changing one line. On the other hand with effects and interpreters you have to rewrite the full runM . runRedis . runHttp ...
function spanning 10 lines to just change one thing. You could even do more crazy things like providing a working Http
component for posting outputs and a failing one for posting statistic (2 different instances for the same type).
Decrypting files
This is seemingly one of the big advantages of effects. Being able to “intercept” a given effect to give it a slightly different meaning:
With components and constructors this can be done by having a “decorator”, a function taking an existing component and adding some functionality on top of it:
With an existing FileProvider
, which we tag as clear
we can make another FileProvider
, this time providing decrypted files thanks to a new dependency, the Encryption
component. And the registry library takes care of wiring everything up:
When you want to build an App
using the registry
, the library will follow the types and see that:
- a
FileProvider
is required - it can be built with the
newDecryptedFileProvider
- this requires a tagged
clear
FileProvider
which we get withnewFtpFileProvider
, but also theFtp
andEncryption
components which we can build with their respective constructor functions
In summary it is possible to take existing components and “decorate” them or “intercept their interpretation” to create enhanced versions of those components.
Batching outputs
This is also a fantastic example of the power of effects. Someone gives you an additional technical constraint and you should be able to implement it without disrupting the majority of your program. With effects we create a function with the following signature:
batch
:: Int
-> Eff (Output [i] ': r) a
-> Eff (Output [i] ': r) a
Once again we “intercept” an existing effect and regroup the elements so that we send 500 at the time and not one by one. Note that the implementation Sandy provides is probably not totally suited for production since we wait until a batch is complete before posting it. In practice we would also post records after some delay, even if the batch is not complete. This would make the signature of batch
function slightly more complex (requiring an additional Time
effect in the stack).
Back to our components we have the following interface for the Output
component:
The Output
component is essentially a “stream transformer” and the newHttpOutput
constructor uses an Http
component to post the records. Here again we can apply the “decorator” pattern and “decorate” that component to create a new one where elements are being batched before being fed to the original one:
batchesOf
uses some combinators from the fantastic streaming
library to batch records and we also “unbatch” elements after use because we still need to return a Stream (Of a) m ()
(don’t worry this is all fused at runtime).
Conclusion
I am still amazed by all the hard work and progress made on effect libraries but I worry that the excitement of novelty hides what is really important when building applications:
- a proper distinction between interface and implementation
- an easy to wire and replace components
There many ways to get there with different trade-offs: effects, records of functions (or the Handle pattern), typeclasses and monad transformers.
I hope that this blog post shows you that we don’t necessarily need fancy type-level techniques to build modular, testable applications with the right levels of abstraction.
Update: the code for this post is available at https://github.com/etorreborre/ingestion if you want to play with it