Formal Verification, Casually Explained

Written during an interesting time in my life

Why are we here?

What guarantees does formal verification provide? This question rests at the apex of a hierarchy of inquiry extending all the way down to how we can know anything at all!

What do we mean by software correctness?

There are precisely two different ways for a piece of software to be correct:

  • The supreme deity of the universe descends from the heavens and decrees, with all the weight of Objective Truth, that a certain piece of software is correct.
  • We have a list of things we want the software to do, and use logic to prove the software does these things.

Crafty readers will come up with, like, twenty different caveats to the second definition. I encourage it! Take a minute to unearth some of the hidden assumptions. Good? Good. I’ll talk about those hidden assumptions in greater detail below, but the point I want to make is this: there is no such thing as inherently correct software. Your wacky program could say 1 * 1 = 2, and that isn’t incorrect in any objective sense — if that’s how you want your program to behave, heck, knock yourself out. We call our list of things we want the software to do a specification. A program is only correct (or incorrect) relative to its specification; if the program is correct relative to its specification, we say it implements or refines the specification.

A door guard to The Law

It seems like we’re just kicking the can down the road. We now have the means to say whether a piece of software is correct, but how do we know the specification itself is correct? There are two ways: one, write an even higher-level specification and prove our original spec implements the higher-level spec (recursively do this as many times as you want), and two: admit the root of truth here is an idea germinated within a fallible human brain, transcribed to spec through fallible cognitive processes, with much lost in translation during transit over the fundamentally unbridgeable lacuna between mind and material world. This is obviously depressing, but take heart! Writing specifications is far better than the alternative, which is getting right down to business and spewing out a bunch of code that vaguely accomplishes whatever you remember your good idea to be. Clearly a horrifying practice not to be attempted except as an exercise in learning the symbol keys on your unlabeled mechanical keyboard. Furthermore, formal specification systems have numerous tools to perform sanity checks on your spec, from syntactic analysis to finite model checking or even formal proofs of desired properties.

Tracing cracks in the edifice

Let’s further exercise our radical doubt against this chain of truth we’re trying to build. First, as we saw above, this chain is fundamentally without an anchor; epistemologically unsatisfying, but better than the alternative. Second, proving a program implements the spec (or that a spec refines the spec) — how can we be sure the proof is without error? Third, how do we account for all the bizarre behavior of the real world — meteorites hitting servers, the arbiter problem, overly-aggressive compiler optimizations, ship anchors cutting undersea cables, and obscure floating point arithmetic errors in old Intel CPUs? Let’s talk about the third issue first; you can think of it as the far end of the proof chain, where the rubber hits the road and the program executes on a real computer shuffling real electric potentials around. Unfortunately, this end is as unmoored as the other. All we can do is present an idealized model of the world and prove our program implements the spec under the assumptions of the model. You can make this model as detailed as you like, all the way down to the electric potentials level (or beyond!) but with software it usually stops above the hardware level and safely assumes all is well below.

Since humans are endlessly fallible, even and especially in mathematical proof, we want a program to do the heavy lifting when proving implementation. There’s the obvious question: how do we know this verifier program is itself correct? It seems everywhere we turn it’s turtles all the way down. Now, many computer scientists will be familiar with the concept of a bootstrapping compiler — a compiler which compiles itself. Why, then, could a verifier not verify itself? This is a very good question, and those possessing a little math knowledge will respond with some vaguely-articulated Godelian objection. I’m intuitively skeptical of this objection, but very interested in what obstacles arise when bootstrapping a verifier; perhaps a future blog post will detail such an attempt. Suffice it to say we can construct a verifier (also called a proof assistant or proof system) which provides extremely strong guarantees that our simple implementation proofs are correct or not.

A low bar to clear

After spending all that time tearing apart our foundations, let’s pause to consider the goal of formal verification. We want very strong guarantees our software is correct, yes, but why? Because correct software provides greater utility than incorrect software. How do we ensure the correctness of conventional software? Code review, automated testing, and real-world use. It’s easy to see these provide significantly weaker guarantees of correctness. For formal verification, we need only three weak assumptions for our program to be guaranteed correct:

  • Our top-level specification accurately corresponds to our idea.
  • If our implementation proof contains errors, they are caught by our verifier program.
  • If an event occurs in the real world, its effect on our program is captured by our model.

Contrast this with a selection of assumptions required for conventional software correctness:

  • A specification exists, and all programmers have the same understanding of the specification.
  • Validation performed by automated tests conform to specification requirements.
  • If the specification is changed, the code & tests are changed, and vice versa.
  • Automated tests exercise every combination of code path and variable assignment (lol).
  • Automated tests do not contain bugs which incorrectly pass a failing condition.
  • The compiler does not contain bugs in its translation to executable code.

And so on. While formal verification may not be fully philosophically satisfying, it accomplishes our goal of writing correct software to a degree unimaginable in conventional software development.

All our enterprise brought to ruin

Now let’s get practical. Can we formally verify a piece of software, today? How difficult is it? How much does it cost? As of late 2016 (ed. note: now early 2018), these don’t all have pleasant answers. Academics say formal verification is ready for prime time; this isn’t necessarily untrue, but the economics don’t favor widespread industry use just yet. The most illustrative case study for large-scale state-of-the-art formally-verified software development is Project Ironclad from Microsoft Research. The team created a full formally-verified software stack from scratch, plus apps; they saw overhead of five lines of verification code for every line of actual code. Project costs were informally estimated an order of magnitude higher than equivalent conventional software. This is within striking distance! Formal verification is now a tooling problem — we need reliable pre-check-in validation tools supporting fast partial verification, human-friendly error messages when the verifier is confused, and (hardest of all) smarter verifiers which don’t require as much help. I expect economics to favor formal verification within a decade, strongly so when factoring in maintenance costs.


You have an idea in your head. You’re sold on formal verification. In which language should your spec be written? I’ll argue your first spec should be in English (or your informal tongue of choice). Writing, as they say, is nature’s way of letting you know how sloppy your thinking is. Mathematics, in turn, is nature’s way of letting you know how sloppy your writing is — and your second specification will be written in the language of mathematics. So, TLA+.

Then what? TLA+ unfortunately lacks the ability to recursively refine your spec down to the level of executable code. At some point it gets close enough to make the hop yourself, but this leaves a distasteful gap in our free-floating chain of perfection. The Dafny language (also from Microsoft Research) is billed as your one-stop shop for specification & verification, but Project Ironclad (and the extension Project Ironfleet) seems to be its only large-scale application. F* deserves mention, if only for someone to finally get around to writing its Wikipedia article.

Nothing has yet emerged as an obviously dominant solution. To me this is exciting; there is an obvious gap that, with the right backing and community, could give rise to something new. Something special.

Formal specification and machine learning

Nobody cares about this


If you enjoyed reading this you should go research the DAO hack and write an opinionated analysis.

Credit to @hillelogram for encouraging me to just publish this, 1.5 years later