Typeable — A long journey to type-safe dynamic type representation (Part 1)
Haskell has grown and changed amazingly in recent years, with great contribution from community and industrial adoption. There are many new ideas and changes in new GHC versions, as well as breaking changes. The most influent library is base
package, which contains the Standard Haskell Prelude, and its support libraries, and a large collection of useful libraries ranging from data structures to parsing combinators and debugging utilities. (hackage)
Typeable is a module in base
package. Maybe you already know, the Typeable class is used to create runtime type information for arbitrary types. Explicitly, Typeable is a unique Fingerprint (or hash) of type coupled with a coercion. This module has big influence in Data.Dynamic
, Data.Data
module as well as generic programming libraries. This post gives you a short story about the innovation of Typeable along with GHC extensions.
This article is divided into 3 parts to keep it easier to follow:
- Remove boilerplate with PolyKinds and Type Equality (GHC 7.8)
- Safer and more expressive type representations
- Decompose function type, TypeInType and Levity Polymorphism
A long time ago…
Typeable was originally a part of Dynamic, which is the solution for Dynamic typing in Statically typed language. In real world, there are programming situations we need to deal with data whose type cannot be determined at compile time, for example, fetch data from API, parse JSON string, query from database...
Since mid-1960s, early programming languages (Algol-68, Pascal, Simula-67) used similar technique of Dynamic
. Until 1989, Dynamic
and typecase
term was introduced in "Dynamic Typing in a Statically Typed Language" paper, with lambda calculus notation.
In Haskell, from what I saw in git.haskell.org, the first commit was in June 28, 2001 by Simon Marlow. However, I’m not sure this proof is correct. Maybe the module was implemented before that. Unfortunately I can’t find another reliable source. In this version, Typeable
and Dynamic
were in same module. In July 24, 2003, Typeable
was moved to Data.Typeable
package.
In the early stage of Haskell, TypeRep
was constructed of TyCon
and child TypeRep
s. TyCon
hold Key
as an unique HashTable
, and a String
to determine type name at user level.
FFI C macros can help generating instances, with some exceptions, such as Tuple
. undefined
was used as an alternative for Proxy. unsafeCoerce
was used anywhere for coercing types.
From Key to Fingerprint
TyCon
was stored in unsafe IORef cache, which internally kept a HashTable
mapping String
s to Int
s. So that each TyCon
could be given a unique Int for fast comparison, the String
has to be unique across all types in the program. However, derived instances of typeable used the qualified original name (e.g. GHC.Types.Int
) which is not necessarily unique, is non-portable, and exposes implementation details.
In July 2011, Simon Marlow replaced Key
with Fingerprint - , and moved TyCon
as well as TypeRep
internally. The fields of TyCon are not exposed via the public API. Together the three fields tyConPackage
, tyConModule
and tyConName
uniquely identify a TyCon
, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique IDs). This implementation is also easier for GHC to generate derived instances
Merge all the things
With the rise of Type-level programming, GHC compiler is more and more better. Many boilerplate codes were refactored and optimized in safe and well-performed. Typeable
also took advantage of new GHC features:
PolyKinds
Before Poly-kinds, there are many duplicated code to support variant for N-ary type constructors. To represent type constructors with kind * -> *
, such as Maybe
or []
, we could create a separate type class, called Typeable1
. However, this approach is ugly and inflexible. What about tuples? Do we need a Typeable2
, Typeable3
... for them?
With kind polymorphism we can write:
We have generalized in two ways here.
- First,
Typeable
getsa
polymorphic kind:forall a. a -> Constraint
, so that it can be used for types of any kind. - Second, we need some way to generalize the argument of
typeRep
, which we have done via a poly-kinded data type Proxy:
The idea is that, say typeRep (Proxy :: Proxy Int)
will return the type representation for Int
, while typeRep (Proxy :: Proxy Maybe)
will do the same for Maybe
. The proxy argument carries no information—the type has only one, nullary constructor and is only present so that the programmer can express the type at which to invoke typeRep
. Because there are no constraints on the kind of a
, it is safe to assign Proxy
the polymorphic kind forall a. a -> *
.
TypeLevelReasoning
Richard A. Eisenberg also proposed and implemented TypeLevelReasoning. New module Data.Type.Equality
was born. The idea is defines the type of equality witnesses for two types of any kind k
:
Pattern matching on this generalized algebraic datatype (GADT) allows GHC to discover the equality between two types. If a :~: b
is inhabited by some terminating value, then the type a
is the same as the type b
. To use this equality in practice, pattern-match on the a :~: b
to get out the Refl
constructor:
We pattern-match on Refl
. This exposes the fact that a
and b
must be the same. Then, GHC happily uses x
of type a
in a context expecting something of type b
.
With this, we can remove unsafe hack when convert Proxy:
And more…
This release also added is the AutoDeriveTypeable
language extension, which will automatically derive Typeable
for all types and classes declared in that module.
The update could caused breaking changes, although backwards compatibility interfaces were kept. José Pedro Magalhães also backed up old codes into Data.OldTypeable
module, which was removed later in GHC 7.10.1 - base-4.8.0.0
However, with the motivation of distribution programming, mainly focusing on Cloud Haskell, new challenges occurred: Typeable wasn’t safe enough… In part 2, we discuss about open world problem, and evolve Typeble into Safer and more expressive type representations.
Originally published at hgiasac.github.io.