KWasm: a new executable semantics for the blockchain
by Patrick MacKay
Runtime Verification, Inc. recently received a generous fellowship from dlab to help fund our efforts to write a semantics for WebAssembly in anticipation of the former’s usage as a programming language du jour across multiple blockchain projects. The name of this project is KWasm. This is the first of a series of blog posts that will chronicle our progress on this project. While future posts will be more technical and highlight progress and challenges, this is more introductory in nature.
So what’s included in the post exactly? We start with a company introduction followed by a brief overview of two topics near and dear to our hearts — programming language semantics and formal verification. We then look at KEVM, a semantics project similar to the one we are undertaking for web assembly, before we conclude with a shallow dive into the KWasm project including a brief introduction of the web assembly programming language itself.
Let’s get started.
Who is Runtime Verification?
Runtime Verification Inc. (RV) is a technology company headquartered in Urbana, Illinois. The company employs a novel software analysis approach to analyze computer programs both statically and as they execute, thereby identifying bugs, errors, holes, and possible exploits. Runtime Verification is the primary developer of the K-Framework. The K Framework automatically generates tools for formal analysis from a single specification of a programming language. Runtime Verification provides bespoke verification services to companies and institutions aerospace, automotive, and blockchain.
Runtime Verification was founded in 2010 and originally cut its teeth working with the R&D units of companies in the embedded space working with companies like Toyota, Boeing, and Denso. These R&D teams were responsible for writing mission critical code for planes, automobiles, and space shuttles. Code of this nature must not fail, even though it is part of complex systems interacting with the mechanical world. Lives and livelihoods depend directly on code correctness. This is where formal verification shines, which is where Runtime Verification shines.
The company was reimagined in 2017 after being approached by two co-founders of Ethereum who each asked if we had considered applying formal verification for the blockchain. We explored opportunities in the nascent industry by partnering with infrastructure builders like IOHK and the Ethereum Foundation.
Fast forward to 2019 and we provide formal verification services to not only infrastructure builders but companies, large and small, building products and services on top of said infrastructure. To date we’ve worked with IOHK, the Ethereum Foundation, Algorand, the Web3 Foundation, the Tezos Foundation, Gnosis, and MakerDAO.
Programming Language Semantics and Formal Verification — why should I care?
Here’s how Wikipedia defines “semantics”:
“Semantics is the field concerned with the rigorous mathematical study of the meaning of programming languages. It does so by evaluating the meaning of syntactically valid strings defined by a specific programming language, showing the computation involved.”
To make matters more confusing, the word “semantics” denotes both the name of the field of study, and the thing it studies. It’s similar to how “philosophy” can refer both to the field as a whole, or a specific school of thought. A semantics of a programming language, say for EVM (Ethereum Virtual Machine) is a correct by construction mathematical expression of the programming language in question. In essence, it’s a set of mathematical definitions that describes what a program is supposed to do. It may contain statements saying things like: “If cond is TRUE, then if cond then A else B = A”. As you might guess, programmers usually have strong intuitions about the semantics of languages they use. Once you have a semantic you can begin to execute and verify programs written in that programming language. Using EVM again as an example, with an EVM semantics you can begin to execute and formally verify smart contracts.
Let’s now attempt to define formal verification.
Formal verification is the process of rigorously proving that a formal model has certain desired properties. The formal model as well as the properties are defined using mathematical/logical formalisms. When the formal model is that of a program in a particular programming language and the desired properties capture requirements of that program, then formal verification becomes program verification. Examples of properties one might want to verify is “this loop terminates” or “the variable balance never goes below 0”.
Formal verification requires that we first write down of a program’s human readable specification. The specification flushes out what the program is supposed to do. Once you have the specification you can verify whether or not the program does just that. At Runtime Verification we do formal verification at the bytecode level, as opposed to on the source code itself. The result is a high level of assurance with respect to the functional correctness of the code. If the verification was done on the source code level we could not know that the bytecode that gets deployed didn’t have bugs introduced by the compiler.
So why should you formally verify a smart contract?
Smart contracts, if written for a permissionless blockchain, are published to a public main net for the whole world to see. This means exposure of your code. Given the fact smart contracts are used to store and transfer tokens, often of a significant monetary value, they are prime targets for attack. As such, you verify your contract to eliminate bugs and errors in your code that can be exploited by bad actors.
We have discovered issues in all the contracts we’ve formally verified. They have not always been issues that can be exploited, but issues clients chose to fix because they are committed to the correctness of their contracts. Contracts that appear correct can be broken with specific inputs at particular corner cases. Simply put, the human mind simply cannot keep track of all such corner cases, which is why tools based on mathematical models of the computing infrastructure, i.e. the EVM bytecode, are necessary.
Now let’s dive a bit into about KEVM, something I’ve alluded to above. KEVM is near and dear to our hearts because it’s arguably the project that got us going in blockchain space and served as a precursor to the KWasm project. It is important to mention KEVM was generously funded by Charles Hoskinson and Input Output Hong Kong (IOHK).
The KEVM is the first machine-executable, mathematically formal, human readable, and complete semantics of the Ethereum Virtual Machine. As such, KEVM is capable of passing the full EVM VMTests and GeneralStateTests testing suites, and is used in smart contract formal verification, debugging, and more. One key aspect of KEVM (which it inherits from being a K specification) is that the same semantics which is used to execute the conformance test-suites is used for formal verification, meaning that we have high confidence that the results of our verification are accurate.
KEVM has been proposed as an official specification of the EVM numerous times (most notably here: https://ethereum-magicians.org/t/jello-paper-as-canonical-evm-spec/2389), because it is written in a literate human-readable style and is testable (unlike the Yellow Paper). It’s kept up to date with the latest hard forks of the Ethereum network, so that our formal verification efforts with it are relevant for the latest Ethereum blockchain.
The biggest accomplishment of KEVM, in our view, is that it has been used by third parties for doing verification (most notably for verifying the multi-collateral Dai contracts used by the Maker Dao https://github.com/makerdao/k-dss). Of course we have used it extensively ourselves to verify contracts for our clients, but it’s nice to know that people are trying to use and improve our tools. To facilitate this, we discuss many usability improvements with third parties using KEVM, and implement (or help implement) them when they make sense for the overall K vision. It’s great that we have a tool which our K experts can use for formally verifying contracts, but even better to see the community at large trying and evaluating our tools, providing feedback, and even implementing their own tools on top of it (eg. KLab by DappHub: https://github.com/dapphub/klab).
KWasm — our plan
Before we dive into our plan for KWasm, the fully executable semantics of web assembly., let’s briefly look at web assembly itself. The text below is taken directly from https://webassembly.org/
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications. It is efficient and fast, safe and secure, open and debuggable, and part of the open web platform.
Runtime Verification developed the KWasm semantics in anticipation of the next generation blockchains, many of which use Wasm. While KEVM is used extensively by Runtime Verification and the broader Ethereum community to verify smart contracts, the impending switch to EWasm as the target virtual machine — and development of other blockchains targeting Wasm — means tooling around the EVM needs to be retargeted to Wasm.
Interested in learning more about the switch to EWasm? Check out these articles:
Although we’ve made great progress on KWasm there is still work needed to be done to tune the semantics for symbolic execution.
Symbolic execution is executing a program, only with variables instead of concrete values in some places. Think of it like this: semantics are a description of how a program should compute its result. Computation and calculation are just two sides of the same coin. Given the program a = 2; b = 3; c = 5; print(1/ (c * (b + a))); we can compute that the program will output 25. No matter what the program looks like, we can always run it and see what it produces, just like we can compute any arithmetic expression, e.g., (2 + 3) * 5.
Symbolic execution is to computation what algebra is to calculation. Given the expression (X + (X+1)) * (X + 2), we can’t compute a result. We can only rewrite it, for example to 2X² + 5X + 2. What we can do, however, is to make deductions about the expression. And since the concrete expression, with values 2, 3 and 5, is just a special case of the algebraic expression, anything we can deduce about the algebraic expression is also valid for the concrete expression, and all expressions like it. For example, we know that for any value of X > -½, (X + (X+1)) * (X + 2) is positive. We can also figure out if there are any interesting edge cases, like if the result can be 0, which it can (at -2 and -½).
In symbolic execution, instead of fully evaluated results, we end up with symbolic results. For example, take a look at the following program. What are the dangers?
Let’s say we get past the assert. Executing this program symbolically, we realize it can return 0, c * a, or 1/ (c * (b + a)), with a, b and c unknown. Now we want to decide if this division is safe. It obviously isn’t if a, b and c can be any value. But it is rare that variables are completely unrestrained. For example, in this case the assert and if-conditions guarantee that c != 0 and that a > b >= 0, in which case, things look safe. But the programmer might have missed the possibility of overflow: if the integer values are 4 bytes, c = 2³¹, b = 0 and a = 2, we get a division by 0.
It is often hard to catch these cases with testing. In a toy example like the one above, a verifier can use a pen and paper to make sure they covered every possible result. With larger programs, it is easier (and faster) to use symbolic execution to get algebraic expressions of the possible states a program can have when it terminates. Those can then be analyzed manually or, as we do at Runtime Verification, with automated tools such as SMT-solvers (more on those in a later post).
Tuning will come in three forms: (i) tuning the Haskell backend of the K Framework to be able to handle KWasm proofs, (ii) tuning KWasm to be better suited for symbolic execution, and (iii) adding standard lemmas to the global bag that all KWasm proofs can use. All three categories will benefit all users of KWasm trying to verify smart contracts written in Wasm.
Now, let’s take a peek at the expected deliverables for this project
- Select six Wasm programs to verify using KWasm, each incrementally using more features of Wasm and more complicated program logic.
- Write K specifications for each algorithm, roughly one per month, along with updates to KWasm necessary to verify the feasibility of the algorithm.
- Integrate developed K specs into KWasm CI, so future updates to KWasm do not break the specs.
- Draft and publish online documentation updated on pushes to KWasm repository.
From the start we will (i) select the targeted “monthly” program to verify, (ii) implement said program in WASM, and finally, (iii) verify said program.
The first algorithm we will verify will be a WRC20 contract (https://gist.github.com/axic/16158c5c88fbc7b1d09dfa8c658bc363), the WASM adaption of an ERC20. ERC20 is the most basic token standard used in Ethereum, and is the building block of many other tokens. This program has no loops and no heap manipulation, so it serves as a nice first step towards ironing out issues in KWasm regarding symbolic execution, while also being highly applicable to the smart contract domain. Quick note, the WRC20 contract does not have all the features of an ERC20 (with allowances and s totalBalance function), it can just transfer tokens and query balances.
Interested in a bit more information about ERC20? Check out these resources.
- ERC20 token standard
- RV blog post on our executable specification of ERC20:
- ERC20 related interfaces, contracts, and utilities, created by the great folks at Open Zeppelin
We hope you enjoyed this introductory post. As mentioned earlier, future posts will focus on project specific updates for the KWasm project. Once again, a big thank you for Emurgo, SOSV, and dlab for their support.
Thank you for reading.