Typeable — A long journey to type-safe dynamic type representation (Part 1)
Haskell has grown and changed amazingly in recent years, with great contribution from community and industrial adoption. There are many new ideas and changes in new GHC versions, as well as breaking changes. The most influent library is
base package, which contains the Standard Haskell Prelude, and its support libraries, and a large collection of useful libraries ranging from data structures to parsing combinators and debugging utilities. (hackage)
Typeable is a module in
base package. Maybe you already know, the Typeable class is used to create runtime type information for arbitrary types. Explicitly, Typeable is a unique Fingerprint (or hash) of type coupled with a coercion. This module has big influence in
Data.Data module as well as generic programming libraries. This post gives you a short story about the innovation of Typeable along with GHC extensions.
This article is divided into 3 parts to keep it easier to follow:
- Remove boilerplate with PolyKinds and Type Equality (GHC 7.8)
- Safer and more expressive type representations
- Decompose function type, TypeInType and Levity Polymorphism
A long time ago…
Typeable was originally a part of Dynamic, which is the solution for Dynamic typing in Statically typed language. In real world, there are programming situations we need to deal with data whose type cannot be determined at compile time, for example, fetch data from API, parse JSON string, query from database...
Since mid-1960s, early programming languages (Algol-68, Pascal, Simula-67) used similar technique of
Dynamic. Until 1989,
typecase term was introduced in "Dynamic Typing in a Statically Typed Language" paper, with lambda calculus notation.
In Haskell, from what I saw in git.haskell.org, the first commit was in June 28, 2001 by Simon Marlow. However, I’m not sure this proof is correct. Maybe the module was implemented before that. Unfortunately I can’t find another reliable source. In this version,
Dynamic were in same module. In July 24, 2003,
Typeable was moved to
In the early stage of Haskell,
TypeRep was constructed of
TyCon and child
Key as an unique
HashTable, and a
String to determine type name at user level.
FFI C macros can help generating instances, with some exceptions, such as
undefined was used as an alternative for Proxy.
unsafeCoerce was used anywhere for coercing types.
From Key to Fingerprint
TyCon was stored in unsafe IORef cache, which internally kept a
Ints. So that each
TyCon could be given a unique Int for fast comparison, the
String has to be unique across all types in the program. However, derived instances of typeable used the qualified original name (e.g.
GHC.Types.Int) which is not necessarily unique, is non-portable, and exposes implementation details.
In July 2011, Simon Marlow replaced
Key with Fingerprint - , and moved
TyCon as well as
TypeRep internally. The fields of TyCon are not exposed via the public API. Together the three fields
tyConName uniquely identify a
TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique IDs). This implementation is also easier for GHC to generate derived instances
Merge all the things
With the rise of Type-level programming, GHC compiler is more and more better. Many boilerplate codes were refactored and optimized in safe and well-performed.
Typeable also took advantage of new GHC features:
Before Poly-kinds, there are many duplicated code to support variant for N-ary type constructors. To represent type constructors with kind
* -> *, such as
, we could create a separate type class, called
Typeable1. However, this approach is ugly and inflexible. What about tuples? Do we need a
Typeable3... for them?
With kind polymorphism we can write:
We have generalized in two ways here.
forall a. a -> Constraint, so that it can be used for types of any kind.
- Second, we need some way to generalize the argument of
typeRep, which we have done via a poly-kinded data type Proxy:
The idea is that, say
typeRep (Proxy :: Proxy Int) will return the type representation for
typeRep (Proxy :: Proxy Maybe) will do the same for
Maybe. The proxy argument carries no information—the type has only one, nullary constructor and is only present so that the programmer can express the type at which to invoke
typeRep. Because there are no constraints on the kind of
a, it is safe to assign
Proxy the polymorphic kind
forall a. a -> *.
Pattern matching on this generalized algebraic datatype (GADT) allows GHC to discover the equality between two types. If
a :~: b is inhabited by some terminating value, then the type
a is the same as the type
b. To use this equality in practice, pattern-match on the
a :~: b to get out the
We pattern-match on
Refl. This exposes the fact that
b must be the same. Then, GHC happily uses
x of type
a in a context expecting something of type
With this, we can remove unsafe hack when convert Proxy:
This release also added is the
AutoDeriveTypeable language extension, which will automatically derive
Typeable for all types and classes declared in that module.
The update could caused breaking changes, although backwards compatibility interfaces were kept. José Pedro Magalhães also backed up old codes into
Data.OldTypeable module, which was removed later in GHC 7.10.1 - base-184.108.40.206
Originally published at hgiasac.github.io.