Towards a simple theorem prover

Since a long time ago, I’ve been looking for a simple theorem prover: a small (say, ~1000 LOC) programming language capable of expressing and proving mathematical theorems about its own programs. If we had such a thing, then developers all around the world would be able to prove assertions about the code they build, and users would be able to trust those assertions without trusting the developers. This quest lead me to many insights and frustrations. I’m writing this post to summarise my journey so far. In the end, I’ll links to potential implementations.

What is a theorem prover, and why it matters?

Suppose we write a smart-contract that holds crypto-assets, and we want to convince people that this contract’s actual balance always matches the sum of the balances stored on internal state. With conventional languages, this must be done through either tests, which are not 100% certain, or audits. Sadly, manually analysing every possible state of a big program is not feasible.

With a theorem prover we could, instead, just publish a proof of the following statement: `sum(state.user_balances) == state.balance`. Since that's a logical proof, all possible states are covered. Once a user checks this proof on his own computer, he's apt to trust the smart contract's balance accounting never goes wrong, regardless of how complex it is, without having to read a single line of its code. Theorem provers essentially reduce human audit work from `O(N)` to `O(1)`, where `N` represents the code size / program complexity, and that's why they matter.

What is the problem with existing theorem provers?

There are many theorem provers in the market: Coq, Agda, Idris, Isabelle to name a few. But most of them share something in common: they’re big. Coq’s kernel, for example, contains about 30,000 lines of code. This is a problem for two main reasons.

First, the language implementation itself can’t verify its own consistency, which means users must either trust it, or implement another language to verify it; but then, users must trust that one. In the end, the code of at least one theorem prover must be trusted. Any minor bug on it would break everything, allowing people to prove false assertions, which would be a security catastrophe. Coq is very stable at this point, but we can never be 100% sure. Other languages such as Agda and Idris are in a worse situation, as they still often find bugs that break their logical consistencies; look for the label `false` on their Github issues.

Second, being huge means they can’t be easily ported or adapted to different platforms or use cases. For example, Coq isn’t compatible with the abstract algorithm, a very fast evaluator for functional reductions. To make it so, one would need to change the inherent logic of the type system, but making such a fundamental change on such a huge codebase would be all but impossible. Similarly, Coq, Agda, Idris won’t run natively on the browser or mobile. What if we want to check proofs on iPads? Surely one can compile it, but then we need to also make sure the compiler is bug-free (and it isn’t).

Looking for a simpler theorem prover

Considering those problems, it seems natural to look for a theorem prover that is as small as possible. If we had, for example, one with <1000 LOC, then most of those problems would be either solved or considerably improved. Auditing a small codebase is much more feasible. Moreover, people could implement it themselves, avoid trusting others. They could also port it to different platforms without needing to involve transpilers. They could adapt it to different use cases without requiring huge investments. A small, stable theorem prover could serve as a universal template for any software-related mathematical reasoning, and that’d be great. But is that even possible?

First candidate: Calculus of Constructions (CoC)

The first candidate is the Calculus of Constructions (CoC). CoC is a really, really simple theorem prover; as in, 400-JS-LOC-simple. It is clean, elegant and doesn’t look like something human-designed, like Agda, but, instead, it looks like something truly natural, like the Euler constant, or the lambda calculus. Yet, despite its simplicity, it acts like a theorem prover, due to one powerful feature: dependent functions. With them, CoC can implement absolutely any datatype you can think of, and can prove statements such as `2 + 2 == 4`, or `not(not(bool)) == id(bool)`. The most popular implementation of it, I believe, is Morte, by Gabriel Gonzales. If you're interested, he has written some articles about it, although Wikipedia also has a good writeup. CoC is itself part of a family of other simple languages, pure type systems.

What is the problem with CoC?

The characteristics above make CoC look like the perfect candidate for what I was looking for. There is, sadly, a huge small problem with it: it is not capable of proving the induction principle. The induction principle (on numbers) says that, if we have a numerical property we’re interested in proving, then, if we prove 1. that this property holds for `n = 0`, 2. that, given the assumption that this property holds for a number `k`, it also holds for `k + 1`, then we can conclude that this property holds for every number `n`.

For example, suppose our property is `forall n: IsEven(n * 2)` (or, "the double of any number is even"). Does it hold for `n = 0`? Surely, because `0 * 2 == 0`, which is even. Moreover, given an assumption it holds for `k`, does it hold for `k + 1`? Surely, because 1. `(k + 1) * 2 == 2k + 2`, 2. `2k` is even (due to our assumption), 3. any even number plus `2` is even. Thus, by the induction principle, we can conclude that the double of every number is even.

Not being capable of deriving the induction principle is a huge problem, because inductive proofs are like the for-loops of mathematics, one can’t go very far without them. For example, while CoC can prove `2 + 2 == 4`, it can't prove `2 + 3 != 4`. In fact, anything slightly more complex can't be done. It can't prove commutativity (`a + b = b + a`), associativity (`a + (b + c) == (a + b) + c`), it can't even prove that `id(x) == x`. Given that CoC fails to prove those simple mathematical statements, it obviously couldn't prove complex software invariants.

The solution: “patching” CoC

Due to this issue, people began “patching” CoC, extending it with new primitives that would enable inductive reasoning. That’s, for example, how Coq was born: it took CoC as its basis, and added a native system for inductive datatypes. The resulting system, CoIC (Calculus of Inductive Constructions) is what Coq’s kernel implements. There is a problem, though: inductive datatypes are complex. If CoC can be implemented in ~400 LOC, I’d be surprised if CoIC could be in less than 2,000. Coq’s Kernel, for example, has >30,000 LOC. This approach is thus complex, bug-prone and not portable. Moreover, it led to higher issues such as HoTT incompatibility. To me personally, it is clear that “hardcoding” a datatype system like that isn’t the best approach when we care about simplicity and portability.

Not everyone on the field agrees with this statement, but some clearly do. One in particular raised my interest. Last year, a paper called Syntax and Semantics of Cedille was released. It presents a type theory called the Calculus of Dependent Lambda Eliminations (CDLE), which, like CoIC, extends CoC with new constructs, but, unlike it, instead of adding inductive datatypes directly, it adds just a few simple and natural constructs that can, then, be used to implement inductive datatypes. About 1 year ago, I managed to implement its kernel, Cedille-Core, in a little less than 1,000 LOC. That’s a 30x reduction in relation to Coq! This, for the first time, gave me a feeling of progress on my long-standing quest. But it wasn’t the end yet.

The problem with Cedille

As soon as I finished implementing Cedille-Core, I realised its problem: while the core language itself is simple, code written on it isn’t. As an example, below is an implementation of natural numbers, plus a proof of the induction principle, on Cedille-Core. You don’t need to understand this code, just notice its size. A complete type-checker is available on the Gist if you want, though.

It is huge. For comparison, this is the equivalent proof of induction on Agda:

From 7 to 67 lines of code, that’s no small difference. The main problem of Cedille is that, while the language itself is extremely simple due to the lack of native inductive datatypes, that complexity shows up when you need to use them on your code. Given that datatypes are the bread and butter of functional programming, working directly with Cedille-Core is much less practical than Agda, Idris and the like. Imagine having to write 70 lines of code whenever you wanted to define a small C struct?

At first glance, I assumed this problem could be solved by extending the language with syntax sugars for native datatypes. That is, the user would type something similar to the Agda version, which would then be expanded to the Cedille-Core version. This would make the language much more practical to work with. But that raises the question: are those syntax sugars part of the language? If yes, then the language size grows accourdingly. If not, then what about those who need to read the proofs? After all, syntax sugars are lost after expansion. The distributed code would still be Cedille-Core, so people inspecting its proofs would see the longer version. While theorem provers allow you to dispense reading proofs, you still need to read the statement that is proven. If doing so is unpractical due to overwhelming verbosity, then this solution is not satisfactory.

Those issues made me reluctant to settle on Cedille-Core as the answer to my quest. At some point, this started looking like a problem without a solution. Datatypes are naturally complex. Inductive ones are even more so. Either that complexity must go to the language implementation, or to the code written on it. Right? But still, Cedille-Core has a different problem: it isn’t just complex, but needlessly so! In order to implement a datatype on it, you need to declare two equivalent but slightly different representations of it, and then intersect those as a third type, which is the “official” one. This approach also makes proving induction roundabout, as you need to, first, prove reflection. After a lot of experimenting, I realised that those redundancies were too overwhelming. But why are they present? Why can’t it be simplified?

Note: I also noticed a different problem with Cedille-Core, which may make it incompatible with the abstract algorithm I mentioned earlier. I won’t elaborate here, but there is more info on this commit.

Simplifying Cedille

Cedille can, indeed, be extremely simplified with a single change: replacing dependent intersections and equality primitives by self types and mutual recursion. Both approaches are actually very similar, but the later achieves two very important feats:

1. Due to mutual recursion, the declaration of the type of a datatype can refer to its constructors before they were defined. As such, we don’t need anymore two different declarations of the same type, plus a third one to merge both. We can declare the datatype in a single go.
2. Since now there is only one type involved, we don’t need to prove reflection to prove induction. This allows us to dispense all equality primitives of the language, which amount to about half of its complexity.

So, in turn, this small change makes the core language roughly 40% smaller, and the implementation of inductive datatypes inside it roughly 60% smaller. This is what the induction proof becomes (type-checker included on Gist):

This is not only much smaller; from 67 to 26 lines; but, most importantly, there is no redundancy anymore. The declaration of an inductive datatype like `Nat` includes exactly the information necessary to specify its induction principle. Of course, this isn’t as terse as Agda, for example; but this time, the added information is there because it is actually necessary. Agda merely hides it on its implementation. As an example,`Eq` reads almost like the `J` axiom, and deriving it is straightforward:

This language also solves many problems Cedille has, such as being unable to prove `1 != 0` without an extra axiom, or being possibly incompatible with the abstract algorithm. Needless to say, I really like it. Sadly, though, good things don't come for free. The researcher who introduced Self-types himself claimed they are too difficult to provide a semantics for, which is why he moved away from them and built Cedille around dependent intersections. In other words, chances are such language would be inconsistent.

I do wonder, though, if Self-types are inherently problematic, or if we just didn’t find the proper setup to give a semantics for them. And even if it is the case that any similar approach would be necessarily inconsistent, we know that “it possible to reason correctly about programs in an inconsistent language, as long as every “proof-like” term is normalized”, as elaborated in this dissertation. So, I do still have hopes this language is very close to what I’m looking for. If anyone has insights on whether it is possible to use such thing as a theorem prover, please let me know, as I do not have enough knowledge to answer that kind of question.