MesaTEE Security Formal Verification in Action

Published in

Baidu Security X-Lab

13 min readAug 9, 2019

Abstract

The memory safety is crucial to guarantee the security of the software in Trusted Execution Environments (TEEs, e.g. TPM and Intel SGX). Although TEEs provide a powerful hardware foundation for trusted computing, but they do not automatically guarantee the security of the software. Any memory safety issues, such as the buffer overflow could provide an attack surface for attackers to invade TEE. MesaTEE (“Memory Safe TEE”) introduces the concept of the Hybrid Memory Safety (HMS) to guarantee the memory safety of the software. For the key components, it reconstructs its semantics with memory safe language Rust to guarantee its memory safety. For other customized libraries, it applies static scanning, dynamic testing and formal verification to guarantee its memory safety at the code level.

Static scanning can quickly report simple vulnerabilities by analyzing the program coding pattern, but it has relatively high false positives and missing rates. Dynamic testing like fuzzing can trigger issues by providing randomly mutated inputs. Its accuracy is high, but it has low coverage rate and cannot prove that the program is flawless. Therefore, in the security sensitive scenario, static scanning and dynamic testing are not enough, and it requires the formal verification to mathematically prove the memory safety of software.

Baidu Security Lab have used formal verification in the deployment of industrial-grade scenarios such as cloud computing, blockchain, and autonomous driving. For example, in the MesaTEE TPM project, we ran formal verification on the TPM core software stack (tpm2-tss version 2.1.0), to ensure that the project’s ~67,000 lines of code do not have security issues in the MesaTEE call scenario. During the verification, 140+ security flaws were identified, including null pointer referencing, out-of-bounds read and write, memory leaks, etc. The verified tpm2-tss version was released on Github (https://github.com/mesalock-linux/tpm2-tss-verified). In addition, in the memory-safe Python interpreter MesaPy project, formal verification guarantees that C backend code is memory-safe in all MesaPy usage scenarios. Fourteen security flaws such as null pointer referencing and integer overflow were identified and fixed during verification. The verification methodology and results were also published on Github (https://github.com/mesalock-linux/mesapy/tree/mesapy2.7/verification).

1. Introduction to Formal Verification

Formal verification is the process of using rigorous mathematical methods to determine whether a program complies to security specifications. The method includes phases: source code preparation, abstract interpretation based value analysis [1, 2, 3] (hereinafter referred to as AI), weakest precondition verification, and program defect report (as shown below).

1.1 Source Preparation

Ideally, the entire code base of the program needs to be verified to guarantee its safety. This includes not only the source code of the target program itself, but also the source code of any third-party library it depends on. In reality, due to the unavailability or the large amount of the code for the third party library, we have to write the function annotation for the unavailable function or functions with complex logics, and conduct the verification on these function annotations.

The next step is to determine the target function that needs to be verified, and write test cases for the target function. In formal verification tasks, the parameters in the test case are in intervals to cover all possible values. Also, all the scenarios of the function should be included, as in the following example:

We can start with the project’s own Unit Test and Integration Test to write the interval version of the test cases. These two types of test cases generally cover key functions and key logic in the project. Using them, we can cover the target and ensure the verification quality.

1.2 Abstract Interpretation Based Value Analysis

In this phase, the analyst setups the analysis policy using on the source code annotation, and then runs value analysis based on abstract interpretation. The analysis engine processes the source code annotation, inserts annotation on Runtime Error Check statement, parses the program control flow and data flow, calculates all possible values of variables in the program, and verifies whether the annotation attribute is true. The statement corresponding to the unverifiable annotation attribute during the analysis is categorized as “Undefined Behavior”. In the actual analysis, people may need to add additional annotation attributes to the undefined behavior to reduce false positives. Then, the “weakest precondition” analysis engine verifies these manual annotations and reports the final undefined behavior to the analyst for final confirmation.

a. Abstract Interpretation

Abstract Interpretation is to use computations of another abstract domain [3], to abstractly approximate the computations of the program’s concrete domain [3]. Abstract semantics covers all possible situations and is a superset of specific program semantics, so if abstract semantemes are safe, then specific semantemes are also safe. Each abstract domain defines the operation rules (Lattice) [3] between abstract objects or variables. The most common abstract domain is Interval Domain [3], which has operation rules similar to the numerical calculation rules (as shown in the example below). Because Interval Domain is easy to understand and not very complex to compute, it is used in the actual program verification analysis for Value Analysis.

Abstract Interpretation may have over-approximation [3] in variable computation. The reason is that in order to complete the analysis in a reasonable time, the analysis engine often expands the range of the variable to reduce the computation complexity. Therefore, as the variable range is expanded, it may be reported as undefined behavior in the Runtime Error Check since it is not in the defined range, but the actual value of the variable does stay in that range. This issue is handled by annotation covered in a section later, to reduce false positives.

b. Analysis Policy

AI-based Value Analysis records the values of variables during the calculation. As the analysis of the program goes deep, the values of the variables will be divided along with the branches of the program, and be calculated separately in the respective branches. Eventually, there will be lots of possible values for a variable at certain point. In theory, for any program, recording all the variable values in the analysis will give the most accurate result, but it comes with the huge space and time overhead, and it is difficult to ensure the analysis completes in a reasonable time. That is where we need to set a reasonable analysis policy, to achieve a balance between time consumption and result accuracy.

We use “state” to represent all the possible values of a variable. In the actual analysis, a “State Level” is set as the maximum number of values that can be saved at any point. If the number of variable values exceeds the State Level, a merge is required. In addition, the level of detail in the relationship between variables will also have an impact on the results of the analysis. This is mainly reflected in how to record the “State Split” between multiple return variables State when the function called returns. The sample code below shows the effect of the State Level and State Split on the analysis results.

If the State Level is set very small (=1), then at L, x ∈[1..9], and y ∈[6.. 10]. However, if the state space is large enough and State Level >=2, the values of x and y will be refined and split into y == 6 ⋀x ∈[1..4 ] and y == 10 ⋀x ∈[5..9]. The larger the State Level, the finer the analysis granularity and the more accurate the result.

In the code above, when the State Split is set to low granularity, at (3) it will report undefined behavior of “use dangling pointer hd”, because it thinks it frees hd at (2), and should not use hd at (3). However, we can see that free of (2) is only executed when err at (2) is true. Under this condition, after (2) free is executed, the branch at (3) will not be executed at all. When State Split granularity is increased, Value Analysis will record the relationship between err and hd, and that false positive will not be reported.

Although the higher State Level and the more State Split granularity can bring us more accurate results, with large amount of code and complex logic, if these two are set too high, it may unable to complete in a reasonable time. Therefore, in the actual analysis, the State Level and State Split need to be evaluated to achieve a balance between result and time cost.

c. Annotation

Annotation uses a formal descriptive language (such as ACSL [4]) to describe program semantics and assist the analysis engine in generating verification constraints. In the actual verification process, due to the limitation of the analysis granularity, the Value Analysis phase cannot produce very fine results, resulting in misreport of Undefined Behavior. This is where the analyst needs to add the annotation, to assist the analysis process in generating more accurate results. The correctness of the annotations also needs to be verified in the weakest precondition section below.

In the following example, assuming that there is no high analysis granularity setting, the relationship between x and y at L1 will not be recorded. At L3, it may generate error message that “x could exceed the index range of the array T”. If you add the annotation “//@assert 0 < x < 10” at L3, you can lead the analysis to eliminate that misreport.

In addition to “assert”, annotation also includes Function Contract and Loop Invariant. Function Contract describes the input and output constraints of the function; Loop Invariant describes the constraints of the loop. Both help Value Analysis to better understand the semantics in complex code and reduce false positives during the analysis.

d. Runtime Error Check

The Value Analysis engine adds the Runtiem Error Check to require the program to conform to specific attributes. For example, in the following example, before the memory data access operation, Value Analysis adds an annotation attribute similar to “/*@assert ret: mem_access: \valid_read(); */” for the validity of the access address. If the value of the memory address is not within the legal range (e.g. NULL), it reports Undefined Behavior.

1.3 Weakest Precondition

The annotation attributes associated with the Function Contract and Runtime Error Check are verified during the Value Analysis process. However, some assert annotations are added by the analyst to assist Value Analysis, such as “//@ assert 0 < x < 10”. For such constraints, Value Analysis always considers them to be true. In order to ensure that these constraints are self-consistent, they must be verified separately. Weakest Precondition (WP) is used to verify these constraints.

Weakest Precondition [5] can be formalized as “wp(S, R)”. For a program position P, S is a statement that starts from P, and R is the postcondition after executing S from P. The meaning of the “wp(S, R)” expression is the weakest precondition that needs to be satisfied at P in order to have R meeting a specific postcondition. In other words, the WP solver can find the weakest preconditions to be satisfied at P according to R and S.

Thus, for each “assert” annotation provided to Value Analysis, its weakest precondition can be obtained by WP. If all the preconditions can fit into the constraints within the context, you verifies the constraints.

2. Case Study

2.1 TPM Verification Analysis

We apply formal verification methods, and use formal verification tool TIS [6] to analyze several industrial-level projects involving cloud computing, blockchain, and autonomous driving. Take the TPM trusted computing core software stack tpm2-tss as example, we verified the code of the project 2.1.0 version, a total of ~1300 functions / ~67,000 lines of code, found and fixed 140+ program issues including null pointer referencing, memory leaks, and out-of-bounds read/write. The following are the program issues (left) aside with the corresponding fixes (right). These issues have been reported to the developer, and got confirmed and fixed in the latest release.

Example1: Null pointer referencing

Example2: Memory leaks

Example3: Out-of-bounds read by loop condition error

Example4: Free error causing memory issue

2.2 Mesapy Verification Analysis

Mesapy is a pypy-based memory-safe Python interpreter. We combined various formal verification tools (smack[7], seahorn[8], TIS[6]), to verify the Mesapy backend C library functions (a total of 5044 lines of code). We found and fixed 14 issues such as null pointer referencing, integer overflow, etc. The following are examples of security defects found during the verification process.

Example1: Null pointer referencing

Example2: Integer overflow

3. Formal Verification Experience Sharing

While formal verification methods can detect and eliminate security issues, there are limitations and considerations for specific applications, as mainly reflected in the following:

3.1 Balance between Time and Accuracy

As mentioned repeatedly above, the program verification not only needs high accuracy, but also ensures the analysis process completes within a reasonable time. In the actual verification we did, we found that setting low State Level and State Split would not impact the analysis accuracy for some verification targets. For those very time-costly verification targets, by adjusting these two parameters, the analysis can be completed within the ideal time. Together with a little bit work of manual confirmation, the verification target can be achieved.

In addition, if it is determined that most verification time is spent on a specific sub-function, we can write a Function Contract for it, and use Contract in verification. This could improve the efficiency of analysis. But as mentioned earlier, writing a Contract for a function requires a complete understanding of the semantics of the function, otherwise it would be counterproductive.

Another way to improve the efficiency of analysis is to rewrite the time-costly sub-function. See the following example:

Suppose that the analysis of the children function is very time consuming. By the function definition it returns two int by parameters, then we can use interval to rewrite. If we can verify that y and z are correct under all possible values, then they must be correct in their respective actual values.

3.2 Need the Right Amount of Manual Involvement

Formal verification is difficult to fully automated. Limited by the complexity of verifying the constraints and the prover’s ability to verify/solve, some constraints cannot be verified and are reported as Undefined Behavior, which requires manual confirmation of whether these are true security issues. The amount of work in this step depends on the setting of the analysis strategy. The larger the analysis granularity, the less manual work that this step requires. Additionally, over-approximation may also bring false positives, which need manual intervention to eliminate these misreports.

3.3 Test Cases Affect Verification Quality

The effectiveness of the verification also depends on the quality of the test cases provided. The principle of writing test cases is to maximize the coverage of branch verification. In general, we can use interval to assign values to parameters when writing test cases, and it covers all possible values, as well as helping improve verification coverage. For those parameters with complex types (e.g. nested structures containing other structures), it is sometimes necessary to consider the function implementation logic to decide which variables need to be assigned to interval and which can stay as concrete value, because all variables as interval may not bring the best verification results, and interval calculations cost significant time and prolong analysis process.

3.4 What Verification Guarantees

Finally, it should be clear that the formal verification used in actual work does not guarantee the verified program issue-free — it can guarantee that for specific verification target (specific version target code, and specific version depending libraries), the code that can be triggered by given test cases, does not have security issues specified by verification policy (such as buffer overflow checks, integer overflow checks, etc.). Any statement that “software has been formally verified, so there will never be any security issues” is exaggerated and not rigorous.

4. Summary

This article explains the principles of formal verification and shares MesaTEE’s experience in formal verification as actual engineering projects, especially the verification of the TPM core software stack and MesaPy C backend. In order to help the industry avoid the problems we had, we have listed the things that need to be considered in formal verification. Our work has proved that it is costly to do formal verification, but it is still feasible through careful design. Comparing with static scan and dynamic test, formal verification is able to identify program security issues more thoroughly, and to ensure the program is safe under the given constraints.