Writing functional DSLs for business domains

Published in

bigdatarepublic

4 min readJun 8, 2020

Orthogonal composable components (by Jasmin)

In functional programming, a domain specific language (DSL) is a set of functions that can be composed to solve a specific problem. They are often found in libraries, but you can also write your own DSL that is specific to your business domain. This can be beneficial for several reasons:

Testable — Each independent component is small and isolated;
Understandable — Composed solutions are easy to read;
Expressive — Solve an entire class of problems with a small set of primitives.

In this post we’ll build a DSL for filtering emails in Scala. When we’re done, we can compose any email filter using simple, orthogonal building blocks.

An example of a DSL for the domain of email filtering

Anatomy of a functional DSL

A functional DSL consists of three components:

Types that describe solutions;
Constructors for those types, that give the simplest possible solutions;
Operators that compose or transform solutions.

With these components, we can construct a solution for a business domain. They are encoded as pure data, which can be evaluated to get a result.

Types

There should be a single type that describes a solution to a domain problem. In our case that’s the EmailFilter trait, that describes solutions to the domain problem of filtering emails. Use sealed traits, case classes and case objects for this.

Constructors

The constructors for these types create the simplest possible solutions. For example, bodyContains or recipientIn. Constructors that use only one case class are called primitive constructors. Derived constructors, like senderIsNot, use a combination of constructors and operators.

Operators

Operators can combine and transform the data structures in order to Lego together more complex solutions. Like constructors, operators can be either primitive, like negate, or derived, like ||.

Design principles

There are many ways to factor a DSL, but some are better than others. These guiding principles help come to a good design. Our components should be

Composable — to build complex solutions using simple components;
Orthogonal — such that there’s no overlap in capabilities between primitives (i.e. MECE or the single-responsibility principle);
Minimal — in terms of the number of primitives.

As always, it takes iteration and refinement to converge to a clean DSL. For example, consider List's flatMap, flatten, and map functions. We could implement flatMap and derive the other two.

Or we could implement flatten and map, and derive flatMap.

So which one is better? They are equally composable, because they result in the same operations. The first approach is more minimal, because it has only one primitive. And the latter is more orthogonal, because flatten and map can’t be split up into smaller operations, but flatMap can. When deciding between minimalism and orthogonality, go with orthogonality, because that gives the simplest design.

Evaluating the DSL

So far we’ve only defined a way to build a data structure. There are two approaches for evaluating it: final and initial encoding.

Final encoding

This approach embeds the evaluation code in the data structure itself as it’s constructed. You can think of final encoding as describing a process of steps that should be executed. The resulting data structure will be executable.

Simplified for readability. There’s a link to the full implementation below.

Final encoding can be more straightforward to implement and allows wrapping existing code like libraries, that you might not be able to change.

Initial encoding

Initial encoding completely separates the evaluation from the data. There are one or more interpreters that traverse the data structure. The run function in this example evaluates if a given email matches the filter. We could also define interpreters that generates a human-readable string for the filter, persist it in a database, simplifies or optimizes it. This can’t be done with final encoding, because functions are opaque — they can’t be inspected.

Simplified for readability. There’s a link to the full implementation below.

Because evaluation is separated from the data, it’s simpler to reason about and gives more flexibility than final encoding. So even though there’s more boilerplate code involved, initial encoding is usually preferred in green field scenarios.

Putting it all together

And that’s all we need in order to write a DSL for a specific business domain: constructors for types that describe solutions, operators to compose solutions, and a way to evaluate the solutions.

Construct and evaluate an email filter

You can read through the full email filtering example for both initial encoding and final encoding.

Thanks to John de Goes whose functional design workshop was the inspiration for this post, and to Diederik Jan Lemkes, Bertjan Broeksema and Annieke Hoekman for proofreading.

Ruurtjan Pul (Twitter: @ruurtjan) is a data engineer at BigData Republic, a machine learning consultancy company in the Netherlands. We hire the best of the best in data science and data engineering. If you are interested in applying functional programming on machine learning use cases, feel free to contact me at ruurtjan.pul@bigdatarepublic.nl.

Understanding Kafka with Factorio

While playing Factorio the other day, I was struck by the many similarities with Apache Kafka.

medium.com

On machine learning team composition

Getting machine learning off the ground requires many skills and capabilities. Some of these skills are related, some…