How does Static Analysis Work?

Veridise
Veridise
Published in
4 min readMay 4, 2022

In the previous blog post, we talked about what the basic idea behind static analysis is and how it can help uncover security vulnerabilities. In this blog post, we’ll dive a little more into how (sound) static analysis actually works.

Static Analysis via Abstract Interpretation

At a high level, most static analyzers are based on a paradigm known as “abstract interpretation”. Just as a regular interpreter for a programming language executes the program on some specific input, an abstract interpreter symbolically executes the program over sets of inputs.

To make this discussion more concrete, consider a simple function F that takes an integer x and returns x*2 (let’s assume unbounded integers for the purposes of this discussion). A regular interpreter can tell us that F will return 2 when executed on 1 and that it will return 4 when executed on 2 etc. However, when we do abstract interpretation, we can ask questions about what F returns on entire sets of inputs. For example, an abstract interpreter can tell us that F will return a value in the range [2, 10] when x is in the range [1, 5] or that the return value of F is always an even integer for any input x.

So, how can an abstract interpreter do that? The key idea is to define the semantics of the programming language over some underlying abstract domain, where each element in the domain represents a set of inputs. Again, the best way to understand this concept is to contrast it against a standard interpreter for a programming language. We can think of an interpreter as a program that takes as input a mapping M from variables to concrete values and a code snippet S and produces a new mapping M’ from variables to values, as depicted below:

The idea behind an abstract interpreter is exactly the same except that it operates over abstract values, which are elements of an underlying so-called abstract domain. For instance, an abstract value could be an interval of the form [a, b] denoting a set of integers x such that a ≤ x ≤ b. Then, just a standard interpreter executes the program over a concrete input, an abstract interpreter executes the program over such abstract inputs:

So what does it mean to execute a statement over abstract values? Let’s try to understand that through an example (again, assuming mathematical integers). Consider a statement like x = y+z, and suppose that our abstract values are intervals of the form [l, u]. If we know that y is in the range [a, b] and z is in the range [c, d], we can conclude that x is in the range [a+c, b+d]. This is precisely what we mean by abstractly (symbolically) executing a given statement!

Abstract Interpretation and Over-Approximations

In the previous blog post, we talked about how sound static analyzers over-approximate program behavior as opposed to performing exact reasoning. So, how exactly does abstract interpretation over-approximate program behavior?

To gain some intuition about this, consider the following piece of code:

if(...) x = x+2 
else x = x-2

where the guard of the if statement is unimportant. If we know that x was in the range [2, 4] before executing this code snippet, an abstract interpreter will correctly conclude that x must be in the range [0, 6] after executing this code snippet. Intuitively, the static analyzer reasons as follows:

  • In the then branch, after abstractly executing the increment operation, we conclude that x is in the range [4, 6]
  • In the else branch, after abstractly executing the decrement operation, we conclude that x is in the range [0, 2]
  • To capture what happens if we take either branch, we compute the smallest interval that includes both [0,2] and [4, 6] (this is referred to as a join operation in abstract interpretation lingo). The smallest such interval is [0, 6], so we conclude that x is in the range [0, 6] after this code snippet.

Crucially, this implies that x could have value 3, even though in reality it cannot. As this example illustrates, abstract interpretation is always conservative about what states the program can be in, with the goal of getting around undecidability while never missing potential program errors.

Summary

Most static analysis techniques are based on the paradigm of abstract interpretation, which is itself based upon the elegant mathematical foundations of set theory. Abstract interpretation provides a recipe for building sound static analyzers, but it can be instantiated in many ways depending on the target application domain. In particular, the choice of the underlying abstract domain varies depending on what types of vulnerabilities/bugs we want to detect.

About Veridise

Veridise offers thorough and comprehensive security audits for blockchain applications. Leveraging our expertise in automated program analysis, Veridise provides state-of-the-art solutions for ensuring security of decentralized finance applications. If you’re interested in learning more information, please consider following us on social media or visiting our website:

Website | Twitter | Facebook | LinkedIn | Github

--

--

Veridise
Veridise

Hardening blockchain security with formal methods. We write about blockchain & zero-knowledge proof security. Contact us for industry-leading security audits.