A Gentle Introduction to Program Verification
In the previous blog posts, we introduced static program analysis and discussed how it can identify security vulnerabilities in smart contracts. In the next few blog posts, we will talk about program verification and discuss what it can add over static analysis.
As we saw in the previous blog post, static analysis is extremely useful for finding specific types of bugs, such as arithmetic overflows. But, in most cases, we not only want to guarantee that the program is free from known types of vulnerabilities but that it is also free from logical errors — that is, the types of errors that cause the program to behave differently than it is intended to. To make such kinds of stronger guarantees, we need program verification!
The first important point to understand about program verification is that it requires a specification, which is a formal description of how the program is supposed to behave. One can think of a specification as a type of contract between the program and its clients: it simply states which interactions are allowed between them. Internally, the program can be implemented in whatever way the developer pleases as long as its externally observable behavior satisfies the specification. Any behavior that deviates from that specification should be considered a logical bug, as the implementation of the program does not adhere to the promises it has made to its users.
As an example, think of an application that implements an auction. The interface of such an application will allow users to place bids and withdraw their bids after the auction is complete. A reasonable expectation from participants of such an auction is two fold: (1) if a participant is not the winner, they should be able to withdraw their original bid (perhaps minus a small fee) (2) the winner of the auction should receive the reward they have been promised in exchange for their bid. Internally, such an auction can be implemented in any number of possible ways as long as its external behavior is consistent with these expectations. Specifications allow describing such expectations in a more formal way that is mathematically precise and machine checkable.
So what do these specifications look like? Well, it turns out that there are many different formalisms (with varying trade-offs) for writing these specifications, and we will describe a suitable specification language for smart contracts in another blog post. But, for the time being, it suffices to keep two important points in mind:
- A prerequisite for performing formal verification is having access to the specification of the program being verified
- The guarantees made by the verifier are only as good as the specification: that is, if the specification is wrong or incomplete, then the guarantees made by the verifier will not mean much.
Next, let’s dive a little bit into how verifiers work under the hood. As shown in the following diagram, a program verifier takes three inputs:
- Program implementation: This can either be the source code of the program, or the source code compiled down to some intermediate representation (e.g., LLVM bit code)
- The specification: As discussed, this is a formal description of the property to be verified
- Human annotations (optional): We can ignore this for now; we will come back to this aspect of verification later in the blog post.
At a very high level, a program verifier typically incorporates two key components:
- VC generator: Using the provided specification (and optional annotations), the verifier first “compiles” the program to a logical formula, typically referred to as a “verification condition” (VC). This formula is generated in such a way that, if the VC is logically valid, then this constitutes a proof that the code satisfies the specification.
- Theorem prover: Program verifiers use an automated theorem prover (typically, an SMT solver) to check whether the generated VC is logically valid. If it is, this means the program satisfies the specification, but if it is not, it does not necessarily mean that the program has an error (more on that below).
If the theorem prover finds that the generated VC is not logically valid, the VC generator may be invoked again (for example, to perform more precise reasoning) or the verifier may simply report that it was unable to prove the property.
It is worth noting that there are several reasons why verification may fail:
- Logical error: If the code does not satisfy the specification, the verifier will undoubtedly fail to prove the property!
- Insufficient human input: As explained above, verifiers typically take some form of human input such as loop invariant annotations. While such input is often necessary to perform full verification, the annotation could be wrong or insufficient to prove the desired property. In this case, verification will fail even though there is no logical bug in the program.
- Incompleteness of theorem prover: The logical formulas generated by the verifier may belong to a logical theory that is undecidable. In this case, the theorem prover may fail to prove the validity of the VC, even though it is actually valid. This, in turn, will cause the verifier to fail.
In short, failure to prove the specification does not necessarily mean that the program has a bug; it could also be due to other reasons as explained above.
So, what kinds of inputs does a human need to provide to the verifier? In practice, the answer to this question depends on the type of the verifier used. For example, traditional deductive verifiers require users to provide suitable loop invariants as well as “method contracts” (i.e., pre- and post-conditions). Other verifiers try to automatically infer these as much as possible, but they might be either best effort (e.g., based on heuristics) or the inference process may diverge, causing the verifier to “time out”. Thus, while verifiers automate the proof construction process to the extent possible, some form of human input is generally unavoidable.
One subtle but important point worth noting is that our discussion so far concerns full (unbounded) verification, meaning that we want to give guarantees about every possible input. On the other hand, if we want slightly weaker guarantees and, for example, are ok with only considering all inputs up to some size, then it is possible to perform verification without any human input. Such “bounded” verifiers do not provide as strong guarantees as unbounded verifiers, but they still consider all possible edge cases up to some bound and have the advantage of not requiring any human input.
Summary
Program verifiers are capable of proving that a given piece of code satisfies its specification mostly automatically. However, they require the user to write a formal specification describing the contract between the program and its clients, so any errors in the specification can lead to spurious proofs of correctness. In addition, full (unbounded) verifiers also typically require additional user input like loop invariants and method pre- and post-conditions. In contrast, bounded verification can be performed without any human input but, in this case, the proof is only valid for inputs up to some bounded size.
About Veridise
Veridise offers thorough and comprehensive security audits for blockchain applications. Leveraging our expertise in automated program analysis, Veridise provides state-of-the-art solutions for ensuring security of decentralized finance applications. If you’re interested in learning more information, please consider following us on social media or visiting our website: