Typeable — A long journey to type-safe dynamic type representation (Part 2)

Toan Nguyen
9 min readJan 5, 2019

--

In part 1, we knew about early design of Typeable, how it changed from mono-kinded to poly-kinded type class, and the early stage of type equality. In this part, we discuss about type-indexed type representations with Kind equality.

With the motivation of distribution programming, mainly focusing on Cloud Haskell, new challenges occurred: Typeable wasn’t safe enough.

Open world’s challenge

Cloud Haskell is an Erlang-style library for programming a distributed system in Haskell. It is based on the message-passing model of Erlang, but with additional advantages that stem from Haskell’s purity, types, and monads. If you are from Scala land, Cloud Haskell has similar features with Akka, which use Actor model for communicating across processes. However, Cloud Haskell also supports thread in same machine. You can conveniently do things the old way.

Because the information that is transmitted between machines by network must be in bit, the data must be Serializable. This ensures two properties: The data can be encoded and decoded to binary fore and back, and can produce a TypeRep that captures the item’s type.

To guarantee that the code is then applied to appropriately typed values, the receiving node must perform a dynamic type test. That way, even if the code pointer was corrupted in transit, by accident or malice, the receiving node will be type-sound. You can think of it like this: a decoder is simply a parser for the bits in the ByteString, so a decoder for Int can fail to parse a full Int (returning Nothing), but it can't return a non-Int. A simple way to do this is to serialize a code pointer as a key into a static pointer table containing Dynamic values (RemoteTable). When receiving code pointer with key k, the recipient can lookup up entry k in the table, find a Dynamic, and check that it has the expected type.

To do that, Typeable needed to evolve further.

TypeRep was not indexed, so there was no connection between the TypeRep stored in a Dynamic and the corresponding value. Indeed, accessing the typeRep required a proxy argument to specify the type that should be represented.

Because there is no connection between types and their representations, this implementation of Dynamic requires unsafeCoerce. For example, here is the old fromDynamic:

Client code is un-trusted. The current design (GHC 7.8) couldn’t hide all such uses of unsafeCoerce from the user. If they could write a Typeable instance, they must use unsafeCoerce, and defeat type safety. So only GHC is allowed write Typeable instances.

Kind-indexed GADTs

The key to our approach is our type-indexed type representation TypeRep. But what is a type-indexed type representation? That is, the index in a type-indexed type representation is itself the represented type. For example:

  • The representation of Int is the value of type TypeRep Int.
  • The representation of Bool is the value of type TypeRep Bool.
  • And so on.

The idea of kind-indexed is originally from GADTs. For example, we consider designing a GADT for closed type representations:

GADTs differ from ordinary algebraic data types in that they allow each data constructor to constrain the type parameters to the datatype. For example, the TyInt constructor requires that the single parameter to TyRep be Int.

We can use type representations for type-indexed programming a simple example liked computing a default element for each type.

This code pattern matches the type representation to determine what value to return. Because of the nonuniform type index, pattern matching recovers the identity of the type variable a.

  • In the first case, because the data constructor is TyInt, this parameter must be Int, so 0 can be returned.
  • In the second case, the parameter a must be equal to Bool, so returning False is well-typed.

However, the GADT above can only be used to represent types of kind *. To represent type constructors with kind * -> *, such as Maybe or [], we could create a separate datatype, perhaps called TyRep1, TyRep2,... Kind polymorphism which allows data types to be parameterized by kind variables as well as type variables, could be the solution. However, it is not enough to unify the representations for TyRep. the type representation (shown below) should constrain its kind parameter.

This TyRep type takes two parameters, a kind k and a type of that kind (not named in the kind annotation). The data constructors constrain k to a concrete kind. For the example to be well-formed, TyInt must constrain the kind parameter to *. Similarly, TyMaybe requires the kind parameter to be * -> *. We call this example a kind-indexed GADT because the datatype is indexed by both kind and type information.

Pattern matching with this datatype refines kinds as well as types — determining whether a type is of the form TyApp makes new kind and type equalities available. For example, consider the zero function extended with a default value of the Maybe type.

In the last case, the TyApp pattern introduces the kind variable k, the type variables b :: k -> * and c :: k, and the type equality a ∼ b c. The TyMaybe pattern adds the kind equality k ~ * and type equality b ∼ Maybe. Combining the equality, we can show that Maybe c, the type of Nothing, is well-kinded and equal to a.

With this design, we also enable type decomposition feature. Finally, new TypeRep will be:

The current implementation of TypeRep allows constant-time comparison based on fingerprints. To support this in the new scheme we would want to add a fingerprint to every TypeRep node. But we would not want clients to see those fingerprints.

The TyCon type, which is a runtime representation of the “identity” of a type constructor, is now silently generates a binding for a suitable instance for every datatype declaration by GHC. For example, for Maybe GHC will generate:

The name $tcMaybe is not directly available to the programmer. Instead (this is the second piece of built-in support), GHC’s type-constraint solver has special behavior for Typeable constraints, as follows.

To solve Typeable(t1 t2), GHC simply solves Typeable t1 and Typeable t2, and combines the results with TrApp. To solve Typeable T where T is a type constructor, the solver uses TrTyCon. The first argument of TrTyCon is straight-forward: it is the (runtime representation of the) type constructor itself, e.g $tcMaybe.

But TrTyCon also stores the representation of the kind of this very constructor, of type TypeRep k. Recording the kind representations is important, otherwise we would not be able to distinguish, say, Proxy :: * -> * from Proxy :: (* -> *) -> *, where Proxy has a polymorphic kind (Proxy :: forall k. k -> *). We do not support direct representations of kind-polymorphic constructors like Proxy, rather TrTyCon encodes the instantiation of a kind-polymorphic constructor (such as Proxy).

Notice that TrTyCon is fundamentally insecure: you could use it to build a TypeRep t for any t whatsoever. That is why we do not expose the representation of TypeRep to the programmer. Instead the part of GHC’s Typeable solver that builds TrTyCon applications is part of GHC’s trusted code base.

TypeRep is abstract, and thus we don't use proxy to determine the TypeRep value anymore:

Kind equalities

We also need to recompute representation equality function eqT. It is easy, just need to compare equality between 2 TypeRep fingerprint:

It is critical that this function returns (:~~:), not (:~:). GHC can't compile. This is because TyCons exist at many different kinds. For example, Int is at kind *, and Maybe is at kind * -> *. Thus, when comparing two TyCon representations for equality, we want to learn whether the types and the kinds are equal. If we used type equalities (:~:) here, the eqTypeRep function could be used only when we know, from some other source, that the kinds are equal.

Richard A. Eisenberg proposed and implemented kind heterogeneous equalities (2013–2015). It enable new, useful features such as kind-indexed GADTs, promoted GADTs and kind families. This extension was experiment in GHC 8.0.1, then was provided in Data.Type.Equality module.

The restriction above exists because GHC reasons about only type equality, never kind equality. The solution to all of these problems is simple to state: merge the concepts of type and kind. If types and kinds are the same, then we surely have kind equalities. In order to overcome those challenges, it has been necessary to augment GHC’s internal language, System FC. This is beyond the scope of this post. If you need to dig into detail, let's read this paper.

fromDynamic turns out like this:

We use eqT to compare the two TypeReps, and pattern-match on HRefl, so that in the second case alternative we know that a and d are equal, so we can return Just x where a value of type Maybe d is needed. More generally, eqT allows to implement type-safe cast, a useful operation in its own right.

Decomposing polykinds representations

So far, we have discussed type representations for only types of kind *. The only operation we have provided over TypeRep is eqT, which compares two type representations for equality. Does (,) which has kind (* -> *) too have a TypeRep? For example, how can we decompose the type representation, to check that it indeed represents a pair, and extract its first component?

Of course it must. Since types in Haskell are built via a sequence of type applications (much like how an expression applying a function to multiple arguments is built with several nested term applications), the natural dual is to provide a way to decompose type applications. Let’s take a look at TrApp definition:

TrApp allows us to observe the structure of types and expose the type equalities it has discovered to the type checker. Now we can implement dynFst:

We check that the TypeRep of x is of form (,) a b by decomposing it twice. Then we must check that rp, the TypeRep of the function part of this application, is indeed the pair type constructor (,); we can do that using eqT. These three GADT pattern matches combine to tell the type checker that the type of x, which began life in the (Dyn rpab x) pattern match as an existentially-quantified type variable, is indeed a pair type (a, b). So we can safely apply fst to x, to get a result whose type representation ra we have in hand.

The code is simple enough, but the type checker has to work remarkably hard behind the scenes to prove that it is sound. Let us take a closer look with kind signatures added:

Note that k1, the kind of b is existentially bound in this data structure, meaning that it does not appear in the kind of the result type (a b). We know the result kind of the type application but there is no way to know the kinds of the sub-components.

With kind polymorphism in mind, let’s add some type annotations to see what existential variables are introduced in dynFst:

Focus on the arguments to the call to eqT in the third line. We know that:

  • rp :: TypeRep p and p :: k1 -> k2 -> *
  • typeRep :: TypeRep (,) and (,) :: * -> * -> *

So eqT must compare the TypeReps for two types of different kinds; if the runtime test succeeds, we know not only that p ~ (,), but also that (k1 ~ *) and (k2 ~ *). That is, the pattern match on Refl GADT constructor brings local kind equalities into scope, as well as type equalities.

Better Pattern matching with PatternSynonyms

The code above is ugly. Moreover, we don’t want to expose TypeRep constructor to users. Earliest solution is returning another GADT:

However, we can make it better with PatternSynonyms extension which was provided since GHC 7.8.

Pattern synonyms enable giving names to parametrized pattern schemes. They can also be thought of as abstract constructors that don’t have a bearing on data representation.

GHC User’s Guide

With this extension, you can not only hide the representation of the datatype, but also use it in pattern matching. Our code is much cleaner.

Summary

So far, the definition of Typeable is:

We also need to implement dynApply, which applies a function Dynamic to an argument Dynamic. It is necessary in real-world application, e.g, send an object with a function type, say Bool -> Int, over the network.

In theory, We can use TrApp and App pattern to construct and decompose function type (->) too. However, there is some limit that need to be overcome, that we will discuss in next part.

Originally published at hgiasac.github.io.

GHC User’s Guide

With this extension, you can not only hide the representation of the datatype, but also use it in pattern matching. Our code is much cleaner.

--

--

Toan Nguyen

A simple developer who still explores Functional Programming