Typeable — A long journey to type-safe dynamic type representation (Part 2)
In part 1, we knew about early design of Typeable, how it changed from mono-kinded to poly-kinded type class, and the early stage of type equality. In this part, we discuss about type-indexed type representations with Kind equality.
With the motivation of distribution programming, mainly focusing on Cloud Haskell, new challenges occurred: Typeable wasn’t safe enough.
Open world’s challenge
Cloud Haskell is an Erlang-style library for programming a distributed system in Haskell. It is based on the message-passing model of Erlang, but with additional advantages that stem from Haskell’s purity, types, and monads. If you are from Scala land, Cloud Haskell has similar features with Akka, which use Actor model for communicating across processes. However, Cloud Haskell also supports thread in same machine. You can conveniently do things the old way.
Because the information that is transmitted between machines by network must be in bit, the data must be Serializable
. This ensures two properties: The data can be encoded and decoded to binary fore and back, and can produce a TypeRep
that captures the item’s type.
To guarantee that the code is then applied to appropriately typed values, the receiving node must perform a dynamic type test. That way, even if the code pointer was corrupted in transit, by accident or malice, the receiving node will be type-sound. You can think of it like this: a decoder is simply a parser for the bits in the ByteString
, so a decoder for Int
can fail to parse a full Int
(returning Nothing
), but it can't return a non-Int. A simple way to do this is to serialize a code pointer as a key into a static pointer table containing Dynamic values (RemoteTable). When receiving code pointer with key k
, the recipient can lookup up entry k
in the table, find a Dynamic
, and check that it has the expected type.
To do that, Typeable
needed to evolve further.
TypeRep
was not indexed, so there was no connection between the TypeRep
stored in a Dynamic
and the corresponding value. Indeed, accessing the typeRep
required a proxy argument to specify the type that should be represented.
Because there is no connection between types and their representations, this implementation of Dynamic
requires unsafeCoerce
. For example, here is the old fromDynamic
:
Client code is un-trusted. The current design (GHC 7.8) couldn’t hide all such uses of unsafeCoerce
from the user. If they could write a Typeable
instance, they must use unsafeCoerce
, and defeat type safety. So only GHC is allowed write Typeable
instances.
Kind-indexed GADTs
The key to our approach is our type-indexed type representation TypeRep
. But what is a type-indexed type representation? That is, the index in a type-indexed type representation is itself the represented type. For example:
- The representation of
Int
is the value of typeTypeRep Int
. - The representation of
Bool
is the value of typeTypeRep Bool
. - And so on.
The idea of kind-indexed is originally from GADTs. For example, we consider designing a GADT for closed type representations:
GADTs differ from ordinary algebraic data types in that they allow each data constructor to constrain the type parameters to the datatype. For example, the TyInt
constructor requires that the single parameter to TyRep
be Int
.
We can use type representations for type-indexed programming a simple example liked computing a default element for each type.
This code pattern matches the type representation to determine what value to return. Because of the nonuniform type index, pattern matching recovers the identity of the type variable a
.
- In the first case, because the data constructor is
TyInt
, this parameter must beInt
, so0
can be returned. - In the second case, the parameter
a
must be equal toBool
, so returningFalse
is well-typed.
However, the GADT above can only be used to represent types of kind *
. To represent type constructors with kind * -> *
, such as Maybe
or []
, we could create a separate datatype, perhaps called TyRep1
, TyRep2
,... Kind polymorphism which allows data types to be parameterized by kind variables as well as type variables, could be the solution. However, it is not enough to unify the representations for TyRep
. the type representation (shown below) should constrain its kind parameter.
This TyRep
type takes two parameters, a kind k
and a type of that kind (not named in the kind annotation). The data constructors constrain k
to a concrete kind. For the example to be well-formed, TyInt
must constrain the kind parameter to *. Similarly, TyMaybe
requires the kind parameter to be * -> *
. We call this example a kind-indexed GADT because the datatype is indexed by both kind and type information.
Pattern matching with this datatype refines kinds as well as types — determining whether a type is of the form TyApp
makes new kind and type equalities available. For example, consider the zero function extended with a default value of the Maybe
type.
In the last case, the TyApp
pattern introduces the kind variable k
, the type variables b :: k -> *
and c :: k
, and the type equality a ∼ b c
. The TyMaybe
pattern adds the kind equality k ~ *
and type equality b ∼ Maybe
. Combining the equality, we can show that Maybe c
, the type of Nothing
, is well-kinded and equal to a
.
With this design, we also enable type decomposition feature. Finally, new TypeRep
will be:
The current implementation of TypeRep
allows constant-time comparison based on fingerprints. To support this in the new scheme we would want to add a fingerprint to every TypeRep node. But we would not want clients to see those fingerprints.
The TyCon
type, which is a runtime representation of the “identity” of a type constructor, is now silently generates a binding for a suitable instance for every datatype declaration by GHC. For example, for Maybe
GHC will generate:
The name $tcMaybe
is not directly available to the programmer. Instead (this is the second piece of built-in support), GHC’s type-constraint solver has special behavior for Typeable
constraints, as follows.
To solve Typeable(t1 t2)
, GHC simply solves Typeable t1
and Typeable t2
, and combines the results with TrApp
. To solve Typeable T
where T
is a type constructor, the solver uses TrTyCon
. The first argument of TrTyCon
is straight-forward: it is the (runtime representation of the) type constructor itself, e.g $tcMaybe
.
But TrTyCon
also stores the representation of the kind of this very constructor, of type TypeRep k
. Recording the kind representations is important, otherwise we would not be able to distinguish, say, Proxy :: * -> *
from Proxy :: (* -> *) -> *
, where Proxy has a polymorphic kind (Proxy :: forall k. k -> *)
. We do not support direct representations of kind-polymorphic constructors like Proxy
, rather TrTyCon
encodes the instantiation of a kind-polymorphic constructor (such as Proxy).
Notice that TrTyCon
is fundamentally insecure: you could use it to build a TypeRep t
for any t
whatsoever. That is why we do not expose the representation of TypeRep
to the programmer. Instead the part of GHC’s Typeable solver that builds TrTyCon
applications is part of GHC’s trusted code base.
TypeRep
is abstract, and thus we don't use proxy to determine the TypeRep
value anymore:
Kind equalities
We also need to recompute representation equality function eqT
. It is easy, just need to compare equality between 2 TypeRep
fingerprint:
It is critical that this function returns (:~~:)
, not (:~:)
. GHC can't compile. This is because TyCon
s exist at many different kinds. For example, Int
is at kind *
, and Maybe
is at kind * -> *
. Thus, when comparing two TyCon
representations for equality, we want to learn whether the types and the kinds are equal. If we used type equalities (:~:)
here, the eqTypeRep
function could be used only when we know, from some other source, that the kinds are equal.
Richard A. Eisenberg proposed and implemented kind heterogeneous equalities (2013–2015). It enable new, useful features such as kind-indexed GADTs, promoted GADTs and kind families. This extension was experiment in GHC 8.0.1, then was provided in Data.Type.Equality
module.
The restriction above exists because GHC reasons about only type equality, never kind equality. The solution to all of these problems is simple to state: merge the concepts of type and kind. If types and kinds are the same, then we surely have kind equalities. In order to overcome those challenges, it has been necessary to augment GHC’s internal language, System FC
. This is beyond the scope of this post. If you need to dig into detail, let's read this paper.
fromDynamic
turns out like this:
We use eqT
to compare the two TypeRep
s, and pattern-match on HRefl
, so that in the second case alternative we know that a
and d
are equal, so we can return Just x
where a value of type Maybe d
is needed. More generally, eqT
allows to implement type-safe cast, a useful operation in its own right.
Decomposing polykinds representations
So far, we have discussed type representations for only types of kind *
. The only operation we have provided over TypeRep
is eqT
, which compares two type representations for equality. Does (,)
which has kind (* -> *)
too have a TypeRep
? For example, how can we decompose the type representation, to check that it indeed represents a pair, and extract its first component?
Of course it must. Since types in Haskell are built via a sequence of type applications (much like how an expression applying a function to multiple arguments is built with several nested term applications), the natural dual is to provide a way to decompose type applications. Let’s take a look at TrApp
definition:
TrApp
allows us to observe the structure of types and expose the type equalities it has discovered to the type checker. Now we can implement dynFst
:
We check that the TypeRep
of x is of form (,) a b
by decomposing it twice. Then we must check that rp
, the TypeRep
of the function part of this application, is indeed the pair type constructor (,)
; we can do that using eqT
. These three GADT pattern matches combine to tell the type checker that the type of x
, which began life in the (Dyn rpab x)
pattern match as an existentially-quantified type variable, is indeed a pair type (a, b)
. So we can safely apply fst
to x
, to get a result whose type representation ra
we have in hand.
The code is simple enough, but the type checker has to work remarkably hard behind the scenes to prove that it is sound. Let us take a closer look with kind signatures added:
Note that k1
, the kind of b
is existentially bound in this data structure, meaning that it does not appear in the kind of the result type (a b)
. We know the result kind of the type application but there is no way to know the kinds of the sub-components.
With kind polymorphism in mind, let’s add some type annotations to see what existential variables are introduced in dynFst
:
Focus on the arguments to the call to eqT
in the third line. We know that:
rp :: TypeRep p
andp :: k1 -> k2 -> *
typeRep :: TypeRep (,)
and(,) :: * -> * -> *
So eqT
must compare the TypeRep
s for two types of different kinds; if the runtime test succeeds, we know not only that p ~ (,)
, but also that (k1 ~ *)
and (k2 ~ *)
. That is, the pattern match on Refl
GADT constructor brings local kind equalities into scope, as well as type equalities.
Better Pattern matching with PatternSynonyms
The code above is ugly. Moreover, we don’t want to expose TypeRep
constructor to users. Earliest solution is returning another GADT:
However, we can make it better with PatternSynonyms
extension which was provided since GHC 7.8.
Pattern synonyms enable giving names to parametrized pattern schemes. They can also be thought of as abstract constructors that don’t have a bearing on data representation.
With this extension, you can not only hide the representation of the datatype, but also use it in pattern matching. Our code is much cleaner.
Summary
So far, the definition of Typeable is:
We also need to implement dynApply, which applies a function Dynamic to an argument Dynamic. It is necessary in real-world application, e.g, send an object with a function type, say Bool -> Int, over the network.
In theory, We can use TrApp and App pattern to construct and decompose function type (->) too. However, there is some limit that need to be overcome, that we will discuss in next part.
Originally published at hgiasac.github.io.
With this extension, you can not only hide the representation of the datatype, but also use it in pattern matching. Our code is much cleaner.