Typeable — A long journey to type-safe dynamic type representation (Part 1)

4 min readJan 4, 2019

Haskell has grown and changed amazingly in recent years, with great contribution from community and industrial adoption. There are many new ideas and changes in new GHC versions, as well as breaking changes. The most influent library is base package, which contains the Standard Haskell Prelude, and its support libraries, and a large collection of useful libraries ranging from data structures to parsing combinators and debugging utilities. (hackage)

Typeable is a module in base package. Maybe you already know, the Typeable class is used to create runtime type information for arbitrary types. Explicitly, Typeable is a unique Fingerprint (or hash) of type coupled with a coercion. This module has big influence in Data.Dynamic, Data.Data module as well as generic programming libraries. This post gives you a short story about the innovation of Typeable along with GHC extensions.

This article is divided into 3 parts to keep it easier to follow:

A long time ago…

Typeable was originally a part of Dynamic, which is the solution for Dynamic typing in Statically typed language. In real world, there are programming situations we need to deal with data whose type cannot be determined at compile time, for example, fetch data from API, parse JSON string, query from database...

Since mid-1960s, early programming languages (Algol-68, Pascal, Simula-67) used similar technique of Dynamic. Until 1989, Dynamic and typecase term was introduced in "Dynamic Typing in a Statically Typed Language" paper, with lambda calculus notation.

In Haskell, from what I saw in git.haskell.org, the first commit was in June 28, 2001 by Simon Marlow. However, I’m not sure this proof is correct. Maybe the module was implemented before that. Unfortunately I can’t find another reliable source. In this version, Typeable and Dynamic were in same module. In July 24, 2003, Typeable was moved to Data.Typeable package.

In the early stage of Haskell, TypeRep was constructed of TyCon and child TypeReps. TyCon hold Key as an unique HashTable, and a String to determine type name at user level.

FFI C macros can help generating instances, with some exceptions, such as Tuple. undefined was used as an alternative for Proxy. unsafeCoerce was used anywhere for coercing types.

From Key to Fingerprint

TyCon was stored in unsafe IORef cache, which internally kept a HashTable mapping Strings to Ints. So that each TyCon could be given a unique Int for fast comparison, the String has to be unique across all types in the program. However, derived instances of typeable used the qualified original name (e.g. GHC.Types.Int) which is not necessarily unique, is non-portable, and exposes implementation details.

In July 2011, Simon Marlow replaced Key with Fingerprint - , and moved TyCon as well as TypeRep internally. The fields of TyCon are not exposed via the public API. Together the three fields tyConPackage, tyConModule and tyConName uniquely identify a TyCon, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique IDs). This implementation is also easier for GHC to generate derived instances

Merge all the things

With the rise of Type-level programming, GHC compiler is more and more better. Many boilerplate codes were refactored and optimized in safe and well-performed. Typeable also took advantage of new GHC features:

PolyKinds

Before Poly-kinds, there are many duplicated code to support variant for N-ary type constructors. To represent type constructors with kind * -> *, such as Maybe or [], we could create a separate type class, called Typeable1. However, this approach is ugly and inflexible. What about tuples? Do we need a Typeable2, Typeable3... for them?

With kind polymorphism we can write:

We have generalized in two ways here.

First, Typeable gets a polymorphic kind: forall a. a -> Constraint, so that it can be used for types of any kind.
Second, we need some way to generalize the argument of typeRep , which we have done via a poly-kinded data type Proxy:

The idea is that, say typeRep (Proxy :: Proxy Int) will return the type representation for Int, while typeRep (Proxy :: Proxy Maybe) will do the same for Maybe. The proxy argument carries no information—the type has only one, nullary constructor and is only present so that the programmer can express the type at which to invoke typeRep. Because there are no constraints on the kind of a, it is safe to assign Proxy the polymorphic kind forall a. a -> *.

TypeLevelReasoning

Richard A. Eisenberg also proposed and implemented TypeLevelReasoning. New module Data.Type.Equality was born. The idea is defines the type of equality witnesses for two types of any kind k:

Pattern matching on this generalized algebraic datatype (GADT) allows GHC to discover the equality between two types. If a :~: b is inhabited by some terminating value, then the type a is the same as the type b. To use this equality in practice, pattern-match on the a :~: b to get out the Refl constructor:

We pattern-match on Refl. This exposes the fact that a and b must be the same. Then, GHC happily uses x of type a in a context expecting something of type b.

With this, we can remove unsafe hack when convert Proxy:

And more…

This release also added is the AutoDeriveTypeable language extension, which will automatically derive Typeable for all types and classes declared in that module.

The update could caused breaking changes, although backwards compatibility interfaces were kept. José Pedro Magalhães also backed up old codes into Data.OldTypeable module, which was removed later in GHC 7.10.1 - base-4.8.0.0

However, with the motivation of distribution programming, mainly focusing on Cloud Haskell, new challenges occurred: Typeable wasn’t safe enough… In part 2, we discuss about open world problem, and evolve Typeble into Safer and more expressive type representations.

Originally published at hgiasac.github.io.