Haskell Book: Chapter 11 — Algebraic Data Types

Big Ideas

Custom Data Types — A Haskell best practice is to create custom datatypes to model a domain. We want to use datatypes to describe the structure of the data that our program is going to be processing. Modeling our domain through datatypes is very important, because it allows the compiler to help us later on. Modeling our domain through datatypes is like modeling our persistence layer in databases — get this part right, and the rest of the program will be easier and more obvious.

Cardinality, Sum Types, Product Types — Why do we care about sum types, product types and cardinality? (Note: Definition of all of these terms are provided in the section below). The reason these things are important is because it tells us how many different implementations there are for a given type.

For example, let’s take the following type signature myFunc :: Bool. If we take a look at the datatype Bool, there are only two possible values for Bool — True or False. This means that there are only two possible implementations of myVal: myVal = True OR myVal = False. There is no other possible way to implement myVal, given the type signature.

Let’s try something with a product type. Let’s say we have the following type: data Product = Product Int8 Int8. The data constructor Product must be applied to two values, so we know that the Product datatype is a product type. What is the cardinality of the datatype Product? Well, Int8 is a type that represents the values from -127 to 128. It’s cardinality is 256, so the cardinality of Product is 256 * 256 = 65,536. This means that if I have the type signature myVal :: Product, there are 65,536 ways that I can satisfy that type signature.

At the end of the day, the lesson is that it is much easier to reason about a type whose cardinality is smaller. This becomes very important for functions, because the cardinality for a function is exponential. We take the cardinality of the second argument, and raise it to the cardinality of the first.

For example, let’s say I have the following type signature:

myFunc :: Bool —> Bool

The cardinality of Bool is 2. I can calculate the cardinality of the type of myFunc (remember that -> is a type constructor) as 2 ^ 2, or 4. However, because things are exponential, even a small increase in the cardinality of the type arguments to -> leads to a large increase in the cardinality of the entire type.

Kind — Like cardinality, kind refers to a property of datatypes. Specifically, kind is the type of a type. For example:

data Person = Person String

The datatype Person has the kind *. * indicates a concrete data type. In other words, the type constructor Person is already a concrete data type / type constant, and it therefore does not have to be applied to a type argument.

However, let’s take another data declaration:

data Person' a = Person' String a

The datatype Person' has the kind * -> *. What this indicates is that Person' type constructor must be applied to a concrete type in order to yield a concrete type. Notice that it is extremely helpful to keep the fact that -> is a type constructor at the forefront of one’s mind. The kind for the -> type constructor is -> :: * -> * -> *. In other words, -> must be applied to two concrete types in order to yield a concrete type.

Newtype — The vast majority of the benefits of Haskell comes from its type system. It follows, then, that knowing how to use the type system effectively is important. We won’t get all of Haskell’s purported benefits without knowing how to wield the type system. I bring this up now, under the section on newtype, because I think that newtype is a perfect example of how using the type system effectively is necessary to increase our code’s safety — i.e., the number of errors that the type system can catch.

For example, let’s say we have the following function:

tooManyGoats = Int -> Bool

Because we have used Int, we haven’t really modeled our domain. That value could have come from anywhere, and can refer to anything. Maybe it refers to the number of cows we have, yet our function name indicates it should be checking the number of goats we have.

We can increase the safety of our code — we can wield the type system more effectively — by using a newtype declaration in this case:

newtype Goat -> Goat Int

(We could have done the same with a data declaration instead of a newtype declaration, but newtype is more efficient. (The newtype data constructor is also strict, while a data constructor from a data declaration is lazy, but whatever — not important to me).

So, newtype can help us wield Haskell’s type system more effectively, and thereby increase the safety of our code. However, there is one other reason for using newtype declarations, and that’s that we can create an instance of a typeclass that differs from the instance of that typeclass implemented by value contained.

Record Syntax — I don’t really want to talk about record syntax. I’m not a fan. I like the idea of passing in arguments by field rather than position, but there are more than a few edge cases where I think records might be more of a hassle than they are worth. For example, can’t have two different records with the same field, since accessor functions are automatically constructed. Then there’s the section in the book about accidental bottoms, etc.

Terminology / Concepts

Nullary Data Constructor — A nullary data constructor is a data constructor that takes no arguments. For example:

data Bool = True | False

The data constructors True and False are both nullary. Nullary data constructors are also referred to as constants.

Unary Data Constructor — A unary data constructor is a data constructor that must be applied to one argument before a value is created / yielded. For example:

data Person = Person String

The Person data constructor must be applied to a value of type String before a value of type Person is created.

Sum Types — A sum type is a type that has more than one constructor inhabiting it. The sum type is denoted by the |, and it means that we have disjunction. For example:

data Bool = True | False

Again, the | clues us in on the fact that we have a sum type. What the sum type is telling us is that a value of type Bool will be constructed from the data constructor True OR the data constructor False. It is one or the other — not both.

Product Types — A product type is a type whose data constructor takes more than one argument. In other words, in order to create a value of that type, we must apply the data constructor to a value of the first type AND a value of the second type. For example:

data Person = Person String Integer

The Person data constructor must be applied to a value of type String AND value of type Integer in order for a value of type Person to be created.

Cardinality — Cardinality refers to a property of datatypes. Specifically, it refers to the number of possible values that can inhabit that datatype. If we think of datatypes as a set of related values, the cardinality of a datatype is the number of values in that set. In the section above, we spoke about how the cardinality of a type can be deduced solely from its type signature.

Extending Code in Haskell — One of the reasons why I want to learn Haskell is because Haskell’s type system allows us to refactor without fear, and also extend our program without fear. (Of course, this all assumes we are using the type system to our advantage — we need to be able to model our domain in the type system).

One perfect example of this benefit of Haskell is problem #5 of the Vehicles exercise. In it, we’ve decided that we are going to change our model — specifically, we’re going to extend the Vehicle datatype such that the Plane data constructor now needs to be applied to not only a value of type Airline, but also a value of type Size. If we make this change in the type, and then try to compile our program, Haskell’s type system will tell us everywhere else that we need to account for this change.

In other words, we change our domain model to incorporate a new requirement — something that happens all the time in software development — and Haskell will point us to where that change breaks our code, so that we can fix it.

Misc

Type Level — Remember that type constructors are used at the type level, and data constructors are used at the term level.

Data Constants — We already mentioned this, but nullary data constructors are also known as constants.

Type Constant — Like data constructors / constants, we also have type constructors and type constants. A type with the kind * is a concrete type / type constant.

Type Constructors — Because types can also be parameterized — meaning that they must be applied to type argument — they can also be described using terms such as nullary, unary, etc.

Type Variables — Types that have a type variable in their data declaration are called parameterized types. It is very important to remember that type variable is a placeholder — eventually, we have to replace that type variable with a concrete type. Only after we’ve applied a type constructor to a concrete type do we get another concrete type.

Data Constructor Classifications — Data constructors are classified according to the number of arguments that they must be applied to:

  • 0 arguments — nullary data constructor; more properly, a data constant.
  • 1 argument — unary data constructor
  • 2 arguments — binary data constructor; also a product.

Language Pragma — A language pragma is an instruction to the compiler to process input in ways beyond what the standard provides for.

type declarations — Type declarations create type constructors, but not data constructors. For example:

type Name = String

We now have a type constructor Name, but there is no data constructor String.

dependent types — We don’t talk about dependent types in Haskell, but one thought that struck me during the discussion of cardinality is that dependent types is all about making the cardinality smaller.

The typechecker helps those who help themselves.

Infix Operators — None-alphanumeric operators are infix by default. All infix data constructors must start with a colon — :. The single colon data constructor is already taken up by cons operation.