Scrap your typeclasses, take 2

May 12 · 8 min read

A good solution,… for some problems

Type classes have originally been invented to solve the “operator overloading” problem in Haskell. We needed to find a convenient way to use the same operator + or * with different types, Int or Double for example. Since then typeclasses have been used for many other things, to overload the “bind/flatMap” operator >>= (with the Monad typeclass), to define JSON encoders/decoders, or to generate arbitrary values with Arbitrary.

However in the last case, data generation, type classes might not be a good idea. In particular they impose an annoying restriction: you can only have one instance of a given class for a given type. For example

This is particularly annoying when writing tests because you want to generate data satisfying lots of different constraints, sometimes short string, sometimes long, sometimes in lower-case only, and so on.

The usual response to this problem is “create a newtype”. Unfortunately this doesn’t work in all situations, most notably when you have nested data types:

If you have Arbitrary instances for all data types, you don’t have an easy way to use a newtype to specify that department names should have upper-case short names without modifying both Company and Department.

This is why a library like hedgehog decided to get rid of type classes and only keep generators. That is not a particularly new idea either, it has been part of the Haskell folkore forever, way before Gabriel Gonzales wrote about it in a famous blog post “Scrap your typeclasses” (2012).

Abandoning typeclasses gives us the flexibility we sometimes need, but at the expense of having to manually “wire” many different generators for our various testing scenarios. You can find realistic examples of such generators in Oskar Wickstrom’s Komposition library.

Some challenges for data generation

Today I want to show that there is a way to solve a set of challenges for data generation without having to use typeclasses:

  1. create generators for nested data types, without writing “wiring” code, just by following the types
  2. create generators for each constructor of an ADT, like EmployeeStatus
  3. override a generator for a specific type, for example only generate companies withPermanent employees
  4. override a “relation” for a specific type, for example how many Departments there should be for a given Company
  5. override the generator for a specific type in a specific context, for example generate only short, upper-case department names but arbitrary employee names
  6. compose generator modifications to allow them to be specified independently
  7. create stateful generators, for example making sure that 2 departments cannot have the same name in a company
  8. for ADT, make sure that every constructor is being generated deterministically instead of letting the random generator pick one randomly

This all relies on a small library registry-hedgehog.

Create nested generators

If we consider the Company data type described above we typically need to declare the following

The first 3 generator declarations are fairly similar and correspond to function application of a constructor in the “Applicative context” of Gen. Then there are 2 declarations to make generators for lists.

When you want to create a generator for a Company, likegenMyCompany, you make nested function calls. Given the all the available functions and a type to generate, like Company, it is not hard to see which functions should be called:

  • genCompany is the only function returning a Company
  • it requires a Gen [Department] we have one other function returning that type
  • that other function needs a Gen Department, do we have a function returning that type? Yes
  • and so on…

This algorithm for “type-directed function application” is so simple that it has been implemented in the registry library. The registry-hedgehog library is just an extension with some combinators specialized for Hedgehog. With registry-hedgehog you can write the following:

You create a “Registry” where you “register” a bunch of functions to use for function generation, like Company, Department, Employee. You also provide some “base generators” like genInt, or genText. They are just “values” because they don’t depend on any other generators. You also need to specify how lists of elements for specific types need to be generated. This can be done with listOf, one of the library combinators (same thing for maybeOf to generate Maybe of a given type).

All of this takes as many lines as creating Arbitrary instances for each type.

Once you have your registry, you can call make and get a nicely assembled generator for Company without writing any more code. In that sense we get as much power as typeclasses in terms of boilerplate reduction but we can solve all the other challenges!

(note: you might have noticed GenIO instead of Gen, we’ll come back to that)

Create generators for an ADT

This part is a bit more involved for “type-directed function application”. Each constructor of a data type returns the same type. When we want to create an EmployeeStatus which constructor should we take, Permanent or Temporary?

The solution is to “tag” values generated with each constructor like so:

Then we have a function genEmployeeStatus taking generators built with both constructors and choosing randomly (with Gen.choice) between them. Again this can be completely automated and there is a piece of Template Haskell producing the same expression:

$(makeGenerators ‘’EmployeeStatus)

Important limitation though! This will not work with recursive data types.

Typically a data type for lambda expressions is

A generator for the App constructor would have the signature:

genApp :: Gen Exp -> Gen Exp -> Gen (Tag "app" Expr)

But the selector generator is

genExpr :: 
Gen (Tag "var" Expr)
-> Gen (Tag "lam" Expr)
-> Gen (Tag "app" Expr)
-> Gen Expr

So that genApp and genExpr are mutually recursive. This is, for now, prohibited by the registry library, I think this can be safely done though but this needs more work.

Override a generator

This is where getting away from typeclasses starts to pay off. We want to generate only Permanent employees for a company. To do this we “override the registry with a fixed generator for EmployeeStatus

We add “on top” of the registry a generator for EmployeeStatus always returning the same value. When the registry library will try to create a generator for Company it will take the first available generator for EmployeeStatus.

Override a relation

That’s an easy one given the previous section. If you want more departments in your company change the function which is producing a Gen [Department]

The function being overridden is simply the Gen.list function which takes another generator as a parameter. You can do the same for other types of “relations”: Maybe, Set, Map, NonEmpty,… This is super-useful in real-life testing because you can really fine-tune your generation to a specific test scenario.

Override a generator in a specific context

We want to generate only short and upper-cased names for Departments, what do we do?

There’s nothing more to do. Instead of just adding the new generator for Text on top of the registry (it would be used everywhere in that case) we declare that we only want to use it when creating Departments.

Compose generation constraints

If you think of setting a new generator as a “Registry modification” and write it as such:

setShortDepartmentNames :: Registry _ _ -> Registry _ _ 
setShortDepartmentNames =
specializeGen @Department genDepartmentName

Then it is easy to see that all those modifications can be composed using the simple function composition operator .

setupCompany = setShortDepartmentNames . setManyDepartments

That’s a feature which comes for free, thank you Haskell! We can also see this as “stateful transformations” in a State monad

Maybe you noticed that we used specializeGenS this time. Indeed there are MonadState xxxS combinators in the library for setting generators or specializing them. We can still compose constraints with the >> operator

setupCompanyS = setShortDepartmentNamesS >> setManyDepartmentsS

This also means that we can write “stateful” Hedgehog properties because the property type in Hedgehog is a transformer, PropertyT

At the beginning of the property declaration we pass the registry containing all our generators, runS generators, and inside the property we modify the registry statefully with setSmallCompany or setDepartmentName.

Create effectful generators

This is another challenge which is quite hard to solve with regular Arbitrary typeclasses or normal Hedgehog generators but it is so essential in practice.

When we generate data types like names, identifiers, nonces, keys,… we very often have a uniqueness constraint on them: “all the departments must have a unique name”. This means that the generation of department names must be effectful. You must remember the previously generated departments in order to generate new distinct ones.

This is supported by the registry-hedgehog library with the setDistinct function

setDistinct :: (Eq a) => Registry ins out -> IO (Registry ins out)

As you can see, we are now doing an effectful transformation of the registry in order to keep track of generated values. But this is not really an issue since the property type used in Hedgehog is PropertyT IO () which gives us the possibility to use IO.

This makes a “stateful” declaration very straightforward because the signature of setDistinctS is:

setDistinctS :: 
(Eq a, MonadState (Registry ins out) m, MonadIO m) => m ()

which allows us to blend such a constraint with others in the PropertyT (StateT (Registry _ _) IO) monad.

This also explains why we create generators as:

type GenIO = GenT IO

That’s because they need to generate some effect, in that case, to add a value to the list of created values once a value has been generated.

I think this feature is a “killer feature” for the registry-hedgehog library. Generating distinct values has always been a bit troublesome and now it can be done in one line of code.

Cycle constructors

Now that we have more control over effects for generating values we can apply it to the final challenge. When you have a large ADT, with many constructors you might want to make sure that over 100 tests, which is the default number of tests for a given property execution, you will use as many as those constructors as possible.

However the default method to select constructors in the $(makeGenerators ''EmployeeStatus) Template Haskell code is using Gen.choice. It would be nice to use a function like cycle

cycle :: [Gen a] -> IO (Gen a)

which goes through all the generators, one by one, and comes back to the beginning of the list. This is an effectful function because we need to remember which generator was selected last.

Fortunately we can do all of this because makeGenerators is not hard-coding Gen.choice as the selection strategy. It is requiring a parameter, GenIO Chooser, containing the selection strategy. The default Chooser value in the registry is using Gen.choice internally but nothing forbids us to pass a new one! This is exactly what the function setCycleChooser is doing:

The property above generates EmployeeStatus values and will repeatedly generate Permanent, Temporary 1, Permanent, Temporary 1, Permanent, Temporary 1,... cycling through the 2 constructors of the EmployeeStatus datatype.

Conclusion

I hope I have demonstrated today that:

  • automating function application is useful
  • typeclasses are not the only tool at our disposal to do it
  • getting away from typeclasses gives us unprecedented expressivity to declare how data generation should be done

The registry-hedgehog is available on Github and Hackage, please give it a go and tell me what you think!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade