Why you should care about dependently typed programming

When people hear of “dependently typed programming”, they may think of languages like Agda and Coq, highly theoretically influenced proof assistants. They may find out that some dependently typed languages are not Turing complete (this is not as big of a downside as it may seem thanks to a very complicated topic involving ‘codata’, but that is a discussion for another time). They may think of mathematicians sitting in ivory towers writing proofs who are entirely detached from ‘real programming’. This notion is one that can be forgiven for believing but should be discouraged all the same. Dependent types are incredibly useful for ‘real programming’, and a natural extension of type systems we use on a regular basis, and I seek to explain why.

Just what is a dependent type, really?

Well, it’s… a type that depends on something. I know, I know, that’s more than a little vague. Another way to call a dependent type is a type family, a type that’s more like a group of interconnected types.

To illustrate, perhaps one of the simplest and widely used dependent types is a Vector. A Vector is just the usual name that dependently typed languages use to refer to a List that ‘keeps track’ of its length. If it makes it easier to understand, a Vector of length 0 is a different type to one with length 1, but both are in the same family. Let’s work through an example and write our own Vector type.

We (need to) use a special way to define dependent types, different than the so-called Haskell98 notation you may be used to. Like a lot of things in the functional programming world, they have a terrifying mathematical name but are ultimately a lot simpler than their name suggests. GADTs, or Generalised Algebraic Data Types, are the tool we’ll use and the standard notation for defining dependent types.

In Haskell98 notation, to simply write the set of all natural numbers (numbers 0 upwards), we might the following. A natural number is either Zero, or any successor of zero. To make things extra clear for those of you that mightn’t have encountered this before, we might denote 1 as Succ Zero and 2 as Succ (Succ Zero):

data N = 
Zero |
Succ N

In Agda, a dependently typed language, we use the aforementioned GADT notation to define our data. The same is done for dependent types in every language I know that has them, and the only differences will be very slight changes in notation. As in the previous example, we have two ways a natural number can be defined, either it’s zero, or it’s a successor of another number:

data ℕ : Set where
zero : ℕ
succ : ℕ → ℕ

We’re saying that is a Set, which you can think of as a Type if that’s easier. We’re saying that zero is of type , and succ is of type when applied to another . Simple, primitive, but a good example of a basic type.

Those amongst you more familiar with functional programming might notice that we’re defining our data types similar to how we would the types of functions. We’d define a plus function that adds two numbers together as having the following type, essentially taking in two things of type to give us something else of type :

plus : ℕ → ℕ → ℕ

This is no coincidence! It’s part of why we have so much power at our fingertips. With that slight, but hopefully interesting digression aside, we should probably do something a bit more useful than numbers. Let’s write Vectors, that type of lists of a given length that I mentioned earlier:

data Vector (A : Set) : ℕ → Set where
nil : Vector A zero
_∷_ : {n : ℕ} → A → Vector A n → Vector A (succ n)

This might look terrifying, but no need to fear, it’s all things you have seen before. The first thing to note is that we take a parameter, A, which is a Set, just like a List would in any other language. The second thing to note is that the curly brackets in {n : ℕ} signifies an implicit parameter. What we’re saying is that Vectors can hold any type, and n could be any natural number, but we don’t need to provide it explicitly (we could write it explicitly if we wanted, but there’s no point).

An empty list (I called it nil) is of course of length zero, and you can have one for any type. The underscores in _::_ denote an infix operator (i.e. you use it in the middle of things, like 1 :: [2, 3]), to construct a list, sometimes called cons, or in Haskell called _:_, where we take an element and a list, and make a list that’s one bigger. Simple!

So perhaps you’re wondering what this gives us over… List.length or similar. Well, a simple use case is it means we can write functions that take, for example, a non-empty list without needing to do the check inside the function, or a function that only takes a list of length three, or whatever else. We don’t need to say “if we get a list that isn’t length three, throw an error, we weren’t expecting this” because the fact our code compiles is proof that it can never happen.

We get this all for completely free, too. We don’t need to think about length if we don’t want, we aren’t obliged to use these as dependent types,

Cute little trick, but how is this useful?

The simplest way to describe it is that we give more power to the compiler to stop bugs because we can write what we mean more. One of the largest benefits I find of languages like Haskell is how the type annotations self-document behaviour. More than any other language, I can read a type annotation and usually know what the function is going to do. It’s sometimes easy to forget that giving type definitions for functions is optional and only helps with self-documentation and stopping ourselves writing nonsense.

To use an example, let’s look at the type annotation for append in Haskell:

_++_ :: [a] -> [a] -> [a]

Or, without syntactic sugar:

_++_ :: List a -> List a -> List a

We take two lists, and get out another list. Since we use ++ to denote it, it’s somewhat clear that it also adds them together, even without the name append that we could have chosen. Still, I am left wanting, there’s a little bit of logic we’re missing. How about if we’d used Vectors in Agda? We can write a function with a type essentially equivalent to the above, but what can we do with dependent types?

_++_ :: {A : Set} → {n m : ℕ} → 
Vector A n → Vector A m → Vector A (n + m)

Yes! Awesome! We can express that it adds them together in the type annotation! It’s a dream come true for anybody who likes self-documenting code, as we not only show more clearly what the code is doing, but also we cannot write a version of _++_ that doesn’t satisfy this predicate, meaning we cannot write nonsense. Though I won’t bore you with the implementations, we have no overhead or extra code required for our special dependent type, it is essentially the same.

It’s a very natural concept indeed. We regularly write types and functions which we attach invisible invariants to very regularly, and just kind of forget about it or store it mentally instead of in a meaningful way. Maybe it’s stored as documentation or a comment or in another file you have to look at to properly understand the code. Instead of writing functions with invisible properties, then write tests stored in another file to make sure everything works, we can be smarter than that, and write functions with visible properties, and the fact that it compiles tells us that we’re right.

Needless to say, having a better append function is just the tip of the iceberg. To digress only slightly, another benefit is being able to express true, proper equality as a dependent type, usually called _≡_ . This is one of the reasons dependently typed languages can be used for proofs. We can express equality as a testable predicate, if we choose to. That’s not to mention that this allows our compilers and IDEs to be significantly more useful, as they can discern what you might mean or want easier.

Okay, sure, but Agda? I’m not convinced that’s ready for general purpose prime time…

I can understand not wanting to write everything in Agda, and even though we can export Agda to Haskell and JavaScript, that doesn’t really cut it for a lot of people if they want a language for general-purpose programming (myself included). So, what are the alternatives?

The first alternative I’ll present, and hottest prospect for the future in my opinion is Idris, a purely functional language with dependent types fully built in. It’s new, but it keeps the power of Haskell and adds even more power to it with fully-featured dependent types. I’m an advocate for adopting new technologies showing promise, but I can understand if someone has trepidation. It’s a shame that Haskell can’t do this, isn’t it?

…well, it kinda can. The second alternative is Haskell. Despite saying that Haskell isn’t dependently typed (which, to be clear, it’s not) it can write dependent types and the same ideas we’ve been talking about. It doesn’t have all dependent types or everything, but it kind of does have enough for most ‘real world’ purposes, which is what we’re talking about here (and more functionality is actively being developed). We can write GADTs in Haskell! We don’t get some things, like_≡_ , the fancy equality I mentioned earlier, but that’s okay. We can still express dependent types like Vectors and a significant amount of other very useful types.

There are of course, other options, and feel free to have a look to see if any of those suit you, but those would be my two picks for things to look into. F* seems to be another good option for those of you that like ML, but it should also be noted that OCaml (4.0+) has similar functionality to Haskell in this department, in that it has GADTs despite not being a dependently typed language.

Though it’s beyond the scope of this post to go into much detail about it, I can recommend the talk Stephanie Weirich did at ICFP 2014 to any reader interested in dependent types in Haskell. In this talk, she uses the example of Red-Black trees, a type of self-balancing tree that balances using an invariant, and writes it in (pseudo-)Agda and converts it to Haskell, and compares it with the non-dependent way. I’d highly recommend this to anybody familiar with Haskell who is unfamiliar with or unsure about dependently typed programming.

Stephanie’s talk about dependently typed Red-Black Trees in Haskell

Runtime errors are so $PREVIOUS_YEAR

We all like to write code that works. I don’t think it’s too much of an outlandish comment to state that (purely) functional programming is a paradigm that lends itself to working code. In fact, it’s such a widely observed occurrence that the Haskell wiki has a whole page dedicated to this phenomenon: https://wiki.haskell.org/Why_Haskell_just_works.

Even if you take this property of ‘just working’ away, dependent types still leave us with richly self-documenting code, and immediately readable proofs that your code works (in some aspects) to a degree impossible without dependent types. In fact, the marriage of code and (previously implicit) behaviour that verifies — not just tests — it’s correct is something that people have tried and failed at doing for a long time. We do this whilst avoiding the pitfalls of Test Driven Design and how easy it is to design code in a way that fits the tests instead of alternatives, and avoiding the pitfalls of writing tests second, where you design the tests to fit the code.

One observation that I have found throughout my programming career is that the more ‘real’ the programming gets, the less easy it is to see exactly where it’s running. A toy program you write at home might finish with an output to STD_OUT, whereas code you write for a big website or company is a very different matter indeed. Similarly, the more ‘real’ the programming gets, the more sure you have to be that it works. One of the most exciting technologies (at least, to me) at the moment is the idea of a ‘serverless’ architecture, including but not limited Amazon’s AWS Lambda, which cements this idea.

As we move towards great technologies that boast many benefits but more ‘distance’ between our screen and the code, the more we have to be sure that our code works, especially given how much harder it is to debug code that runs but doesn’t work. We also need to be sure that code is understandable and readable without needing to ask somebody else what it means or have a seperate file for documentation open. Dependent types can improve both to a substantial degree and although I would advocate for their use even without these technologies, I’d argue that these technologies work better when we have more confidence that code that compiles is code that works. I don’t think it’s a coincidence that companies like Facebook are developing so much in languages like Haskell and OCaml, where dependent types are natural extensions.

It’s important to also note that we don’t lose anything by having the option of using dependent types. Even if there’s some philosophical reason why your append function (even for Vectors) can’t specify n + m, you’re not forced to write things that way. Having the option of specifying dependent types is strictly beneficial. You can use them in places you want to, or they’re appropriate, and not use them in places you don’t want to, or they aren’t appropriate!

So… What’s the catch?

So we’ve got these types that act as self-documenting proofs that functionality works, add clarity, add confidence our code works as well as runs. And, more than that, they make sense. Why didn’t we have these before? The short answer is, they’re a new concept, they’re not in every language, a large amount of people don’t know they exist or that this is even possible. Also, there are those I mentioned earlier, who hear about its use in research and dismiss it as purely for that purpose (let’s not forget that people write papers about languages like C and [Idealized] Algol, too). The fact I felt the need to write this article extolling their virtues should be proof enough of that.

Like object orientation and other ideas before it, it may take a while before this idea seeps down into being taught at universities and seen as standard. Functional programming has only just entered this space. The main stop-gap right now is this knowledge, and it’s the same reason you can’t snap your fingers together and have a bunch of Java devs who have never seen Haskell before writing perfect Haskell day one. Dependently typed programming is still a new concept, but that doesn’t mean you need to wait. Things we take for granted were new once, too.

I’m not arguing in favour of everybody in the world switching to a dependently typed language and doing everything possible dependently typed, that would be silly, and it encourages misuse. I am arguing in favour of, whenever possible (e.g. if you’re already using Haskell or similar) perhaps thinking whether dependent types suit what you’re writing. Chances are, there’s probably something they do suit very well indeed. They’re a truly fantastic tool and I’d argue that they will get better as time goes on due to way architecture will evolve. I think we’ll be seeing a lot more of them in the future.