Existential crisis

Concert
11 min readFeb 12, 2018

--

Paul plays around storing and operating on collections of mixed-type values in Haskell. If you’re comfortable with Data.Dynamic there’s probably nothing new here, but playing around is still fun!

All mixed up together! Rubiks cube by Sonny Abesamis

One of the components in the audio collaboration system we’re building needs to handle data whose type is determined by other components at run time. Our existing implementation used a simple sum type (a tagged union) to do this, but that approach has some deficiencies. So, over Christmas, I decided to experiment with different ways of storing values with mixed types in the same data structure.

This post aims to explain some of the things I found, and should hopefully compliment the good material that others have already published.

The Haskell wiki page on heterogeneous collections suggests a number of approaches for storing values of mixed type in a data structure:

  • The simple algebraic sum type that we were already using
  • A “universal type” — embodied by Data.Dynamic
  • “Existential” types
  • The Hlist strongly typed collections library.

Whilst I’ve recently been playing with Idris and have read quite a bit about dependant typing, the description of using Hlist as “very advanced and a little restrictive” made me quite nervous! I did some reading about it and it looked amazing and terrifying in equal measure, so I decided to move on.

Of the other two suggested approaches from the wiki, Dynamic seemed too much like black magic at first glance. So I finally settled on trying my hand with existentials again.

Why?

Before I get into exactly what I was playing around with, I should explain briefly why we want to be able to store a collection of values of different type.

The architecture for the audio collaboration tools that we are building has a publication/subscription API middleware component that deals with forwarding messages to interested parties. This component also does a lot of data validation so that the providers of APIs can focus more on the business of executing the work, rather than checking values and providing good error messages.

For this architecture to work generically, the API providers need to be able to publish a definition of the types of values they accept for any given piece of data. (That is, they need to be able to describe types.) Also, we need to be able to cache a copy of all of their data so that when a new client subscribes we can publish the current state of the data to them. (That is, we need to be able to store arbitrarily typed data.)

You can see all the components working together in this demo video.

Problems with using an algebraic data type

Our first implementation used a simple algebraic sum type to describe possible values:

-- | WireValues represent data that our API receives "off the wire"data WireValue = WvInt Int | WvString String deriving (Show, Eq)

This behaves just how you’d expect. You can wrap up values of different types into a value of the sum type and put them into a container:

λ: [WvInt 3, WvString "hello"]
[WvInt 3,WvString "hello"]

However, everywhere you want to get access to the content of a WireValue you have to pattern match all the possibilities:

wireValueAppend :: WireValue -> WireValue -> Maybe WireValue
wireValueAppend (WvString s1) (WvString s2) =
Just $ WvString $ s1 ++ s2
-- Bad types:
wireValueAppend _ _ = Nothing
λ: wireValueAppend (WvString "hello") (WvString "world")
Just (WvString "helloworld")
λ: wireValueAppend (WvString "hello") (WvInt 3)
Nothing

This works just fine, but starts to get cumbersome when you add more types to WireValue because every function has to cope with new unpacking possibilities. The -Wincomplete-patterns GHC option is really useful for spotting mistakes here, but the approach still leads to lots of cases to keep track of.

Another problem we had was that of expressing container types. If we extend WireValue to express lists recursively

data WireValue = WvInt Int | WvString String | WvList [WireValue] deriving (Show, Eq)

we run into the problem that our type is too permissive — it’s possible to store a heterogeneous list of values inside a WvList itself, so we’d have to validate the type of each and every value of the list at runtime.

Do existentials have the answer?

From an extremely dubious and hand-wavy point of view, I want Haskell just to “know” what type I stored in a value but hide it from my code until I need to unpack it.

We can do something like this with existential quantification (or “existential values”). There are some good tutorials on this already out there. The headline for me is that existential type declarations have a type variable on the RHS that is not present on the LHS:

{-# LANGUAGE ExistentialQuantification #-}data WireValue = forall a. Show a => WireValue ainstance Show WireValue where
show (WireValue a) = show a

The explicit forall a. on the RHS is the actual existential quantification. Notice the Show class constraint, which we’ll come to in a moment.

This definition is a good start. We can put values into a WireValue and GHCi can show us what they were, regardless of what the type of the contained value is:

λ: WireValue 3
3
λ: WireValue "Hello"
"Hello"

The trick is that we had to tell Haskell that values that can be held in WireValue are Showable, and, more annoyingly, the only thing we can do with a value wrapped up in WireValue is show it. Hmm, perhaps this isn’t going to be so useful!

As an aside, you can use StandaloneDeriving to generate an instance of Show for WireValue automatically:

{-# LANGUAGE StandaloneDeriving #-}deriving instance Show WireValue

Adding more type classes: Equality

Perhaps we can make our WireValue type more useful by adding more constraints as to the kind of thing a can be? Let’s try to define equality.

It’s easy to extend the definition of WireValue with the Eq constraint:

data WireValue = forall a (Show a, Eq a) => WireValue a

But trying to define an instance of Eq for WireValue itself gives us a compiler error

instance Eq WireValue where
(WireValue x) == (WireValue y) = x == y
-- Couldn't match expected type ‘a’ with actual type ‘a1’...

because we can’t guarantee that x and y are of the same type. (Fun fact: standalone deriving of Eq runs into this exact problem too.)

What we need is a way of safely casting values to a given type.

After a bit of searching, I came across the cast function from Data.Typeable that does exactly what we need!

λ: :type Data.Typeable.cast
cast :: (Typeable b, Typeable a) => a -> Maybe b

With that we can define an Eq instance that does what we want:

import Data.Typeable (Typeable, cast)data WireValue = forall a. (Typeable a, Show a, Eq a) => WireValue ainstance Eq WireValue where
WireValue x == WireValue y = maybe False (x ==) $ cast y

And it works beautifully:

λ: WireValue 3 == WireValue 3
True
λ: WireValue 3 == WireValue 4
False
λ: WireValue 3 == WireValue "apple"
False

Wait — what is Data.Typeable?

Chris Done has written a fantastic post on Data.Typeable and the related Data.Data package, with a view to programming generically in Haskell. The documentation for Data.Typeable mentions not only generic programming but the Data.Dynamic package, which I had so carefully avoided because I didn’t understand the magic. Perhaps now was the time to dig a little deeper?

Data.Typeable provides the typeOf function as a way to extract a representation of a Haskell type from a value:

λ: :type Data.Typeable.typeOf
Data.Typeable.typeOf :: Typeable a => a -> TypeRep

The (type-safe) cast function we used earlier simply guards an unsafeCoerce with a check that the types you’re coercing between have the same representation.

Looking at the definition of Dynamic, we can see something familiar — it contains an existential!

data Dynamic = forall a. TypeRep a => Dynamic a

(I’ve translated the definition from the original GADT style)

The type of Dynamic says that as long as you can represent the type of the value a, you can hide it in the Dynamic wrapper type. This looks similar to how we’ve defined WireValue! It seems as if trying to understand existentials before Data.Dynamic was a Good Thing™.

So, is there a practical use for being able to use types as values, other than cast?

Orderability

If we try to define an Ord instance for our WireValue type in the same way we did for Eq we run into a problem: We don’t know what the right answer is when the types don’t match.

data WireValue = forall a. (Typeable a, Show a, Eq a, Ord a) => WireValue ainstance Ord WireValue where
compare (WireValue x) (WireValue y) = maybe _onMismatchedTypes (compare x) $ cast y

When we had mismatched types with == we could just return the default value of False. In the above snippet for compare, _onMismatchedTypes can’t just magically produce an Ordering value like LT because you would get inconsistent answers depending on which order you compared the two values.

But values of TypeRep are comparable, so, if the cast fails, we can simply compare the type representations of the values instead:

instance Ord WireValue where
compare (WireValue x) (WireValue y) =
maybe
(compare (typeOf x) (typeOf y))
(compare x) $ cast y

This now works nicely:

λ: WireValue 3 < WireValue 4
True
λ: WireValue "hello" < WireValue 1
False
λ: WireValue 1 < WireValue "hello"
True

Types as values

What I’ve presented so far is a nice straightforward path through some of my holiday discoveries playing around with existentials and Data.Typeable. However, in practice, getting my head around some of this stuff wasn’t easy and I took a few interesting diversions along the way.

One of those diversions was looking at how I might convert a type to a value when I did not know about the magic of Data.Typeable. This section may be instructive if you, like me, found this all too magical!

We can define some data types that together represent types our WireValues can contain:

data WireConcreteType = WctInt | WctString deriving (Show, Eq, Ord)
data WireContainerType = WctList | WctMaybe deriving (Show, Eq, Ord)|
data WireType
= WtConc WireConcreteType
| WtCont WireContainerType WireType
deriving (Show, Eq, Ord)

These constructors, even without particularly many options, can be used to represent wide variety of types:

-- Int:
t1 = WtConc WctInt
-- [String]:
t2 = WtCont WctList $ WtConc WctString
-- [Maybe Int]
t3 = WtCont WctList $ WtCont WctMaybe $ WtConc WctInt
...

To extract a type as a value at runtime, we can define the getType function as follows. Note that we use a proxy a value, rather than simply an a value, so that you don’t actually need to build a value to extract its type. This is useful in the case of lists, as it allows us to define getType recursively, avoiding having to work out what to do when given an empty list.

{-# LANGUAGE ScopedTypeVariables -#}import Data.Proxyclass Wireable a where
getType :: proxy a -> WireType
instance Wireable Int where
getType _ = WtConc WctInt
instance Wireable a => Wireable [a] where
getType _ = WtCont WctList $ getType (Proxy :: Proxy a)

We need the ScopedTypeVariables extension because we need to refer to the specific a that getType was called with when building the Proxy in the Wireable instance for [a].

Now we can get WireType values from Haskell types:

λ: getType (Proxy :: Proxy Int)
WtConc WctInt
λ: getType (Proxy :: Proxy [[Int]])
WtCont WctList (WtCont WctList (WtConc WctInt))

I was dead chuffed when I first got this to work — I think it’s pretty cool.

getType does an equivalent thing to the typeRep function from Data.Typeable. Our WireType data type is fulfilling the same role as the TypeRep data type from Data.Typeable, although it’s much more restricted — TypeRep can represent algebraic data types with multiple arguments to constructors and comes with a bunch of helper functions to unpack and inspect it.

As an aside, when doing this work I also discovered the TypeApplications extension, which provides slightly less clumsy syntax for specifying the types of polymorphic functions:

λ: :set -XTypeApplications
λ: getType $ Proxy @Int
WtConc WctInt

So much for the existentials, where’s the crisis?

We’ve seen that, thanks to existential quantification, we can easily store values of mixed type in the same collection. We’ve also seen that, thanks to Data.Typeable.cast, we can safely extract values back out to perform further processing.

So, our lives are fairly easy if our code doesn’t care at all about the type contained in a WireValue, or if we know exactly which type we need to extract from it. Some difficulties start to appear when we want to act differently based on what is stored in the existential type.

As we looked at earlier, the existential quantification lets us hide a type variable on RHS of a type signature so that it is not visible on the LHS. This unlocked all the fun stuff we see above. However, the cost is that, if given a WireValue, Haskell cannot make any guarantees about the type it has. That severely limits what you can do with it!

Early on in my experimentation I kept wanting to do things like

extract (WireValue x) = x

so that I could get at the value inside the WireValue. But that gives us an error:

Couldn't match expected type ‘p’ with actual type ‘a’
because type variable ‘a’ would escape its scope
This (rigid, skolem) type variable is bound by
a pattern with constructor:
WireValue :: forall a. (Typeable a, Show a) => a -> WireValue,
in an equation for ‘extract’

According to this Stack Overflow thread, the word skolem refers to a type that is unknown but fixed — we can’t know at compile time what x will be, but it will be something! This does make sense, but it really hit me sideways when I was trying to reason about my code, because as a human I could often see what type something would be. (I have a lot of experience with Python, where playing with this kind of interrelation between types and values is much more fluid, but completely unsafe! Hence, sometimes my intuitions in Haskell are too permissive.)

cast got away with being polymorphic in its return type because it was the caller of the function that determined what they wanted cast to return. cast could simply decide not to return anything if the caller asked for the wrong type. We need a different mechanism — one where the callee gets to choose the type and the caller has to provide the polymorphism.

{-# LANGUAGE Rank2Types #-}overWv :: WireValue -> (forall a. Typeable a => a -> b) -> b
overWv (WireValue x) f = f x

The Rank2Types extension lets you write an explicit forall to bind the variable a at a tighter scope than variable b. The type signature of overWv effectively says “give me a WireValue, and a function that can take a value of any Typeable a that I choose to a value of some type b that you choose, and I’ll give you a value of that b type.”

Needless to say, having to construct a generic function that can handle any a conforming to a constraint can feel a little cumbersome. It’s been tempting to use the Wireable class I briefly sketched out earlier as a hook where I can keep adding specific (mostly validation) functionality so that I can write things like:

class Wireable a where
myFoo :: a -> Bool
data WireValue = forall a. (Wireable a, Typeable a) => WireValue aoverWv :: WireValue -> (forall a. Wireable a => a -> b) -> b
overWv (WireValue x) f = f x
doFoo :: WireValue -> Bool
doFoo wv = overWv wv myFoo

But this couples together Wireable, WireValue and the code that does the validation horribly, given that the code can be split across multiple modules but all has to work together.

Hopefully the above is illustrative of the kind of issues I’ve faced trying to make this stuff work in practice. It’s been great levelling up my understanding of more complex type signatures, and I think our production code is more precise about what we mean because of having used the existential stuff. However, some of the stranger error messages and the dancing around Haskell’s (correctly enforced) constraints about how you can use the existentials has been really challenging! I keep wanting some of the advanced dependant typing features that make Idris so exciting, but I expect they bring their own set of challenges.

If this material has been useful, do let me know. It will help me target what to write more about in the future. If you’re hungry for more interesting Haskell stuff, especially about the progress of dependant types in Haskell, I recently came across this really good HaskellCast interview with Richard Eisenberg.

--

--