Abusing Haskell dependent types to make Redis queues safer

I don’t know about you, but quite frankly I’m tired of seeing the same tired vector example being used to demonstrate why dependent types are useful. Recently, I had the opportunity to use current Haskell’s dependently-typed facilities to solve a somewhat contrived, yet real-world problem. Below I will go through the steps I took in moving from an untyped free-for-all to having GHC reject more bad programs for me.

This blog post is aimed at the perpetual-intermediate-Haskell-programmer, who knows her way around GHC, &c, but sometimes doesn’t know why certain extensions exist. This post by Christian Marie is what finally made it all click for me.

Say you’re writing some piece of software, and you have decided to use a microservices’ architecture. Never mind that this is allegedly a bad idea. For communication, you decide to (ab)use Redis lists as queues, and build a rather simple API. For simplicity, the software is designed such that all queuing operations will be asynchronous and blocking:

This will spoil your day

Now there is a bit that needs to be unpacked in this snippet.

  • We’re using the protolude library so that we don’t tear our hair out. Among other things, it gives us $!, ByteStrings, and async functions.
  • We are using hedis to connect to our local Redis. In order to simulate queues we use a common Redis idiom: lpush to send a message, and brpop (with a zero timeout meaning “block indefinitely”) to wait for a message.
  • The hedis library expects all values to be ByteStrings, so we (de)serialise our data using store. In truth, we could have used any serialisation library (or even aeson).
  • We are doing zero error checking here. Connecting will succeeded, the Redis operations will not blowup, and messages will deserialise successfully. Thus we use fromJust and fromRight. YOLO.
  • We would like to at least “mark” our messages, by wrapping them in QMessage. In practice, we might use Redis for more than just message passing, and we might like to include metadata (such as timestamps) with our payloads.
  • The extensions we have to use are:
  1. ScopedTypeVariables so that we can do inline type annotations (msg :: (QMessage Int)).
  2. FlexibleContexts because the type signatures are apparently too complicated for plain Haskell.
  3. StandaloneDeriving and GeneralisedNewtypeDeriving so that we can tell the compiler to derive Store instances for messages.
  4. NoImplicitPrelude and OverloadedStrings should always be used.

In the main function we simulate a message getting passed between two services, spawned as threads here. Everything goes swimmingly, of course. We are assuming, by the way, that each queue is associated with a single type (there’s no dynamic typing happening at runtime).

For completeness, we should also formalise queue names. We can do so like this (this will become much more important in a bit):

Bringing some sanity

Now, if you’re anything like me you will eventually screw up and push the wrong message type to a queue. In that case, the receiving application will crash. You might also pass a datatype, but one day add a new constructor, compile the sender only, and then have the receiver blow up since deserialisation failed on the type. Changing the line msg :: (QMessage Int) <- receiveMessage "theQ" to msg :: (QMessage Char) <- receiveMessage "theQ" gives us the cryptic runtime error (from store) simple: PeekException {peekExBytesFromEnd = 4, peekExMessage = "Didn't consume all input."}

There’s obviously a lot that can go wrong if you’re converting values to ByteStrings and then hopping them between applications.

Since I’m stupid, I would like the language and compiler to help as much as possible. That is literally the reason why I picked up Haskell — because I’m dumb. Right now, the only ID a queue has is its name, but how do I make queues more strongly typed, to prevent weird stuff happening at run time?

A poor man’s solution might be to “tag” queue names with the type of their payloads. Using Data.Typeable we can add a type description and a Fingerprint to the queue name. If the sender is sending the wrong type, or the wrong version of a type, the decorated queue name will be different from the queue name expected by the receiver, and so the receiver won’t crash (it will fail to receive the bad message).

But this is supposed to be an article about dependent types, so let’s get to it!

We wish to associate an AppId/QueueType pair not only with a queue name, but also with a type. Ultimately, we would like it if GHC would reject code that tries to put the wrong type on the wrong queue.

That is, we wish to create a mapping from the two data types to a message payload type. In Haskell, to write such a mapping we should use TypeFamilies.

Now we get to the point of this article: we need to somehow write a function from data constructors (values) to types. In vanilla Haskell we cannot do that; data and types live in totally disjoint universe and you can’t have them interact.

In modern GHC, there happens to be a way to have your cake and eat it, too. We can promote data constructors from the value level to the type level by using the DataKinds extension. When we do this, the data type can be used as a new kind, and its constructors can be used as type constructors.

Manufacturing types out of thin air!

We create two new universes of type s— kinds — using the same syntax we use to declare a data type, and then declare a type family which maps pairs of those types to a concrete Type. Note the overloading of the :: syntax to create kind signatures (an ability implicitly enabled by the DataKinds extension) instead of the usual type signatures.

Type (also called *, although the poor GHC parser sometimes gets confused by this symbol when we start doing funky stuff) is a special kind, since type constructors of this kind can be inhabited: the type Char is of kind Type, and can be occupied by values such as 'a', '1', &c. We’re creating our own two new kinds — AppId and QueueType — which will have types, but those types will not have any values. Sounds completely useless, but it is immensely powerful.

We can now associate types to combinations of AppIds and QueueTypes:

FizzBuzz takes numbers and produces text!

In practice, these instances may be declared in disparate modules, so we use an open type family (there is no where in the declaration of the family).

Note the use of prime marks (') on the type names. These are strictly not necessary, but they help us visually distinguish between using, for example, Doer as a value and as a type. It just so happens that we won’t ever use the regular data types, anyway.

Now, let’s wire up these types such that sendMessage and receiveMessage functions are better typed! We will leave the implementations blank for now:

o my!

This is how we make our API type-driven. We essentially write a type-level program, by telling GHC that we have an AppId and a QueueType. We then tell GHC to find a type c such that c is equal to AssociatedQueueType a b, and GHC will eventually find the relevant AssociatedQueueType instance and figure out the result type. We are directing the GHC type constrained solver, by declaring the properties of the types we want, and — at compile time — GHC will tell us if it couldn’t make all the constraints work together: if our code is badly typed!

We need to enable ExplicitForAll, to allow us to write the kind signatures. We also need to enable the scary-sounding AllowAmbiguousTypes. Without the latter extension enabled, we will get strange errors that look like this:

GHC helpfully tells us what to do here…

As a heuristic, the ambiguous types’ error pops us when we declare type variables but do not associated them with parameters or the result (to the left of the final =>). Haskell’s type resolution is “two-sided”: one side “pushes” a certain type, the other “pulls” another type, and our program typechecks if both sides can be made to agree (unified). I’m furiously waving my hands here because I actually haven’t the foggiest inkling how type resolution works in GHC.

In this case, GHC is rightly worried that, since there are no parameters or results of type a and b in our function definition, there won’t be a way for us to specify — in vanilla Haskell98 — at the call site what that side of the equation is, by providing explicit parameters of concrete types, or using type annotations to “pull” the types we want.

By enabling AllowAmbiguousTypes we tell GHC that we know what we’re doing. But if there are any issues they will be found by the compiler, so we’re not making our program as a whole less typesafe.

At call sites, we can have our cake and eat it. We resolve the ambiguity, and make our message passing type-driven, by utilising the TypeApplications extension. Since the a and b type variables do not exist after the =>, they are invisible to the caller; we make them visible with TypeAllications — a “visibility override.” Behold:

I have to admit: that is some bad-ass syntax right there

Now GHC knows what the a and b type variables should be instantiated to, and type-checking can continue since the ambiguity has been rectified. Note also that we don’t need explicit type annotations for the messages anymore — the calls are fully type-driven!

Now, to get back to the implementations. The hedis library is unsophisticated: it merely wants a queue name and will happily push and pop ByteStrings as much as we want. But we need to provide it with a queue name.

We have an issue here: obviously we want each app/queue type combination to have its own queue name, so we ultimately need to somehow map a pair of types (say, @'Doer and @'InputQueue) to a string (the queue name).

Recall that our old getQueueName simply used the Show instances of the data types. With our API, we don’t ever want to use the data constructors, since we would be duplicating data (once for the data constructor, and once for the type application; to jump directly from one to the other we would need singletons, and even the guy who wrote the library admits that they’re ugly and terrible).

Modern GHC has a bit of a hack we can use, though. We cannot automatically convert a data-constructor-cum-promoted-type-constructor to a string, but we can associate them with Symbols (type-level strings), and then have GHC convert the Symbols into regular Strings (which we will very quickly convert to ByteStrings).

This is some dark juju…

If you blink, you might almost miss it. Symbol is a kind, whose residents are unoccupied types. The residents of Symbol look an awful lot like regular Haskell Strings. Each one of these “strings” is actually an entire type, which GHC magically produces for us. Unlike our data kinds, the Symbol kind does not have a finite, closed universe of types. By the way, somebody (who knows how to use TH) should really write a library to automatically create a type-level show function for data kinds…

That’s not where the magic ends. GHC also allows us to convert Symbols into actual, value-level Strings using the symbolVal function. So now, we can write a “dependently-typed” version of getQueueName:

Seriously: this is magic

Note that I’m using toS from Protolude to automatically convert from String to ByteString. Let me just say: I think the type for symbolVal is ugly — Proxy types serve to make types visible, just like TypeApplications — but I guess the writers did not want to force users to enable TypeApplications to use this function, so they added what is essentially a dummy parameter. Also, the KnownSymbol constraints feel icky.

Now we can write our two API functions properly (with those icky KnownSymbol constraints infecting everything):


Now if we accidentally write msg :: (QMessage Char) <- receiveMessage @'Doer @'InputQueue in the listener or sendMessage @’Doer @’InputQueue $ QMessage 'x'in the sender, then at compile-time GHC will complain about expecting an Int and getting a Char instead, just like we wanted all along. All of that just because I confused my letters with my letters…

For completeness, here’s the full dependently-typed code: