Tagless with Discipline — Testing Scala Code The Right Way

Published in

Iterators

8 min readDec 18, 2018

With the recent boom in the adoption of so-called final tagless encoding in Scala land, which in turn seems to be addressing the shortcomings of the Free monad approach, the testability of programs is better than ever. The general consensus is that one of the main benefits of the Free / tagless style is that it allows for easy unit testing programs without the tedious process of setting up dependencies etc…

class TaglessService[M[_]: Monad](taglessRepository: TaglessRepository[M]) {
  def getList(limit: Int): M[Seq[Item]] = {
    taglessRepository.getList(limit)
  }
}

You simply bind to the ID monad (or swap interpreters if you’re a Free fan) and you’re good to test all the pure logic. Obviously, these techniques also help with integration testing by virtue of being able to easily transform components between various monadic contexts.

For example, you can produce a DBIO instance out of your computation and interpret it as a Future in an automatically rolled back transaction. No more setting up fixtures and maintaining the ever-elusive db state in tests. Integration tests can be fully parallel and much less flaky. That being said, it is still a bit of a burden to actually write these tests, mainly because of the informality of a component's specification that results in a lot of repetitive case-by-case testing.

What do I mean?

Let’s see what the typical approach might look like.

Problem Domain

Let’s say that we have a system storing users along with their preferences and e-mails. GDPR aside, we want to identify these users by their e-mail, set some properties, etc… So, in the tagless manner, we have created two repositories with abstracted monadic context:

In short — you have a bunch of emails and you can attach them to the user. Nothing extraordinary here. Additionally, you can do some lookups and updates. Supposedly, you’d like these structures to be kept in a relational database, so you implement these repositories for DBIO (if you use slick) or something similar (if you don't).

The details of the implementation do not really matter. It’s sufficient to say that it maps abstract operations to the “real ones.” What matters is that it needs to be tested at some point to ensure correctness of mapping, constraint violation handling, etc…

Typical Integration Testing

Normally, you would write a test case for each expected behavior. First, you would need some fixtures and transactions to prepare the db for a test scenario:

And then, using all the test infrastructure you wrote, you can write integration tests:

Unfortunately, writing these tests is very tedious and it’s tempting to cut some corners by skipping some important cases. For instance, would you test what happened when you saved a string with all the whitespace? Or various combinations when parts of a UserProfile are missing? Or all the VARCHAR constraints?

Moreover, you need to test a lot of implicit interactions between various methods. What should double save do? What should find do after a successful create ? If find returns something, then what is its relation to getEmails ? All these facts are tested by checking the behavior of the method with respect to some implicit database state. This is achieved by preparing a vast array of fixtures meticulously recreating the desired state before the test.

All this has one detrimental effect when it comes to generality — people work with tagless to abstract away effects. Yet, if you were to exercise this benefit, you’d have to rewrite all the tests with all the specifics of the new effect, making it prohibitively expensive.

Bring on Some Discipline

So, we seek to obtain the following:

Write Less
Test More
Be Explicit about How Methods of Algebra Should Behave
Be Generic

These things seem a bit contradictory. With the typical approach, the only way to test more is to write more tests and fixtures! And how can you be more explicit while being more generic if being explicit means writing fixtures that are everything but generic? It turns out that these properties cannot be obtained through a typical approach, so the only way to proceed is to change the approach.

Instead of writing tests, let’s formulate laws that the implementation of algebra should respect. Then, let’s use the automated law-checking library, Discipline, which will generate a large number of random test cases with ScalaCheck. This will allow us to test with sufficient confidence that any implementation is following our laws.

Thus, we get the following benefits:

We do not have to write tests — only laws and some infrastructure code (data generators, equality definitions).
Tests can exercise cases that are hard to come by when writing them manually (e.g., very large strings, empty values).
Tests work regardless of implementation.
Laws serve as an explicit documentation of behavior.

Let’s see the details!

Writing Laws

When you take a look at the Emails algebra, the following laws come to mind:

For every saved email e, find(e) returns e.
For every saved email e, known(e) returns true.
find is consistent with known i.e., find(e) is defined IFF known(e) is true.
Saving the same email twice always returns EmailAlreadyExists error.

By translating these laws into operations using this algebra, you get (in pseudo-code):

save(e) >> findEmail(e) <-> pure(Some(e))
save(e) >> known(e) <-> pure(true)
findEmail(e).fmap(_.isDefined) <-> known(e)
save(e) *> save(e) <-> pure(Left(EmailAlreadyExists))

In the example above, we use the standard cats syntax where: a >> b means a flatMap (_ => b); a *> b means product(a, b).map(_._2). We also use the <-> symbol to express the equivalent to relation.

UPDATE: Oleg Pyzhcov commented on Reddit:

You need to be careful to also capture the effects in laws, not just the result. A good litmus test is to see if the law specifies a possible refactoring that doesn’t break anything.
For example, I would not be able to blindly substitute by this law:
save(e) >> known(e) <-> pure(true)
because it completely removes the effect of saving stuff. The correct law would be
save(e) >> known(e) <-> save(e) >> pure(true)

I agree with his insight. For one thing, it makes reasoning about longer expressions correct. Thus, save(e) *> save(e) <-> pure(Left(EmailAlreadyExists)) should similarly be rewritten as: save(e) *> save(e) <-> save(e) >> pure(Left(EmailAlreadyExists))
You should keep this advice in mind when working on your laws.
Thanks, Oleg!

Similarly, you can devise a set of laws for Users algebra:

For every created user u, identifyUser(primaryEmail(u)) returns u.
For every created user u, identifyUser(e) returns u IFF e has been attached to the user u.
For every user u with profile p, creating the user and then updating their profile is equivalent to creating the user with the profile already updated i.e., createUser(e, p) >>= (u => updateUserProfile(uid(u), f(p))) <-> createUser(e, f(p)).
Attaching n emails via n calls to attachEmail is equivalent to calling attachEmails once with collection of all n-emails.

To be complete, we should have written laws governing the behavior of the remaining methods — find, getEmails, etc... I took the liberty of skipping it to be concise.

Let’s see how we implement these laws in Discipline.

Implementing Laws

The implementation of law checking needs to be tailored to ScalaCheck to achieve automated testing. That is, a law must be a valid ScalaCheck property. We’ll be using cats-kernel-laws provided IsEq type for that. The purpose of this type is twofold. First, IsEq(lhs, rhs) states that the left-hand side of the IsEq expression is equivalent to its right-hand side. Second, it is convertible to ScalaCheck Prop by Discipline. We form IsEq instances by using a handy <-> operator.

So, the implementation of the first law for the Emails algebra might look like this:

The way we read the method is:

For any email: Email, the expression algebra.save(email) >> algebra.findEmail(email) must be equivalent to M.pure(Some(email)). And that's what we want every implementation of the Email algebra to respect.

We’ll also prepare a generic test suite (called Laws in Discipline's terminology):

Now, we have to tell ScalaCheck how to generate emails. I recommend reading a ScalaCheck tutorial first, but it’s very simple in essence. There has to be an Arbitrary[Email] instance in the implicit scope of tests:

Finally, we’ll have to specify how to check the equivalence for a given monad M. For the DBIO context, we'll have to run both sides of the action that we check against the test db in a rolled-back transaction (to give each test a clean db state) and then compare the outputs. (This is an adaptation of a snippet found in slick-cats.)

Given all these pieces, we can finally test our implementation:

We now have a test suite that checks if a bunch of random emails can be saved to db and, subsequently, looked-up. Not only is the logic of these operations tested, but you can also catch errors stemming from incorrect mapping of the db schema. Let’s see how the whole suite looks:

You can see that you usually need to write additional generators as you add tests. But since they are composable, writing them is quite easy and bound by the size of your domain. (You only have to write new generators for domain-specific things that ScalaCheck does not know how to mock.) Another idea that may be worth exploring is the automatic derivation of Arbitrary instances for regular product/sum types by Magnolia.

You might be wondering why we are interested in generating a function? Sometimes you can form compact laws by stating that: the law holds under any transformation f. Just as we did in the findKnownConsistency test, def findKnownConsistency(email: Email, f: Email => Email).

It can be read as: given any saved email e and arbitrary transformation f, the result of find is defined for f(e) if the known(f(e)) is true. This is a stronger statement than saying that the law holds for any saved email e, letting us skip separately testing the case when find returns None.

To see why, let’s consider that f is a function that appends xyz to the mailbox part of the email address. The equivalence should hold for the choice of f, and, indeed, save(email) >> find(Email(s"xyz$email")).map(_.isDefined) is None, and save(email) >> known(Email(s"xyz$email")) is false. Alternatively, when f is the identity function, we expect find to return Some and known to return true. Thus, we have conflated two cases into one law.

Conclusion

I hope that my article has convinced you that testing in tagless should be based on abstract law rather than ad-hoc test cases. Can you think of any cases where this approach would be inferior to writing tests manually? Certainly, writing laws is harder than coming up with a bunch of test cases, and it requires some getting used to.

Conceptually, all type classes come with laws. These laws constrain implementations for a given type and can be exploited and used to reason about generic code. typelevel.org