How TLA+ Can Help DeFi Protocols Deliver Bug-Free Code

Published in

Onomy Protocol

6 min readSep 13, 2021

What is TLA+?

TLA+ is a formal specification language that allows you to solve high-level conceptual problems in software engineering at the design stage. Essentially, it is a logical and philosophical language facilitating the discovery of obscure bugs in code before writing even begins. To do this, TLA+ draws upon concepts in mathematics to express specifications. These can then be used as a roadmap to build out successful, bug-free systems, no matter how complex they are.

TLA+ can’t be used to code programs, or power software. However, by adhering to the temporal logic and set theory it expounds, projects and teams can iron out the possibilities of more unusual, unforeseen bugs appearing late in their development cycle. It’s a first-order modelling language that comes with its own Integrated Development Environment (IDE). The IDE, also known as the TLA Toolbox, allows stress testing to be performed on formal specifications themselves versus implementations which may be programming language specific.

Why is TLA+ Useful?

Top level companies, including Intel, Microsoft and Amazon Web Services, have used TLA+ within their teams to specify critical systems. By conforming to the TLA+ specification, untraceable bugs that arise from complex logical interactions between different parts of the code can be avoided.

These surprise, anomalous bugs can be hard to find and very expensive to correct, as the error could permeate far and wide throughout the system before they are caught or understood. One small hole in the logic of a complex system could put the entire system in jeopardy if not remedied before placed into production. Because the errors that TLA catches are within the logic of the system rather than syntactical errors caught by compilers, the entire code in multiple systems might need to be ripped out and rewritten simply because of a single unintended behaviour, while independently, each unit was functional and effective.

How Does TLA+ Work?

It works by checking for violations of safety, using basic set theory, and liveness, by using temporal logic. It is exhaustively testable within its IDE before embarking on a sprint. One of Amazon’s key insights on TLA+ was that it “finds bugs in system designs that cannot be found through any other technique.” It can be most likened to the popular academic tool LateX — a code that helps produce documents to the required spec. The difference is TLA’s speciality is in accurately designing distributed and concurrent systems.

Why Would Crypto Projects Benefit From TLA+?

Crypto and blockchain technology are possibly the world’s most bleeding edge distributed and concurrent systems currently in development, so using TLA+ on such developments should be standard. Protocols are written everyday by any manner and any number of dev teams, sometimes at lightning speed, and it’s possible the code could be riddled with unsuspected bugs. Even best-intentioned projects could find they’d committed tough-to-spot errors that were hiding in the shadows.

If they did, funds could be at risk, hence no longer “SAFU” — sometimes by malevolent activity, but much more likely the result of simple human error. In most cases, disruptive bugs that aren’t a result of sloppy code or ill-willed backdoors are oftentimes beyond the theoretical conceptualisations of the greatest software architects when working in their own specialist language. Blockchain is new for everyone involved, and lessons are being learned everyday.

The usage of TLA+ allows projects to go from conception to production much quicker, without needing to spend significant time experimenting and stress testing in hopes of finding hidden bugs.

Hackers who can exploit unsafe code bases could leave the thriving DeFi market in ruin, if simple exploits could be suddenly found in ancient, popular and otherwise watertight protocols. For institutional adoption of decentralized finance to happen en masse — something which, for all the posturing done by banks opposing crypto, is surely going to happen sooner rather than later — then more formal procedures of assessing code bases will be needed. It could act as something of a stamp of assurance that legitimises the deployment of a protocol, and help lever the very largest whales gently onto the good ship crypto.

How Large Companies Have Used TLA+

Although large companies are notoriously secretive about their codebases and their methods, plenty of anecdotal evidence, like this lecture, exists about its use at the high echelons of company practice. Amazon Web Services even published a few papers about how they use it when deploying their systems. Intel and Microsoft have also been reported to use it regularly.

Amazon Web Services

Amazon has used TLA+ since 2011 to help solve difficult design problems in critical systems. By having access to so much customer data, their distributed systems need to be entirely fault tolerant. As these systems grow exponentially with the business, the complexity of the software grows. When reaching such high complexity, bugs that were beyond conception when the project was in its infancy can begin to rear their ugly head.

Amazon lauded the language’s ability to force writers to use precise designs. Furthermore, as it is cross-compatible, Amazon felt it was cost-efficient to apply it to multiple real-world complex systems of varying utility. To put it simply, it makes engineers who use it to write better code. Its widespread application across multiple systems and ability to deal with their exponential growth effectively means start ups could easily benefit from using TLA+.

Intel

Intel was one of the first adopters of TLA+, mentioning a use case of it in a paper as far back as 2002. It describes how the formal verification team used TLA+ to design a new cache-coherence protocol for a processor chip. It returned 45 issues with the proposed code, and it “played an important role in creating a stable microarchitecture,” and had the lowest ratio of bugs per line despite being a brand new release at the time. Intel is reported to still be using TLA+ as late as 2014.

Microsoft

Microsoft used TLA+ in developing a model for the memory coherence protocol in the Xbox 360. Despite the TLA+ pointing out a subtle but damaging bug, they had faith in their regression tests to spot any problem. They didn’t, and the bug had to be expensively ironed out over the crucial Christmas launch period. Microsoft also uses TLA+ in its development of its Azure cloud service, with Albert Greenberg saying it’s “especially well-suited for writing high-level specification of concurrent and distributed systems”.

How TLA+ Could Help Crypto Coders Build Sound Protocols

All these examples prove that TLA+ is not only effective, but it also helps developers code faster and more accurately. In the breakneck world of blockchain, with so much uncharted territory and new concurrent and distributed systems being developed and built upon everyday, using TLA+ when designing a new DeFi protocol can help it avoid falling foul to the code tripping over itself in the most obscure manner possible. By doing so, it avoids endangering the backers of the nascent DeFi protocols that promise to shake up the way we think about finance.

This excellent video serves as an introduction to TLA+, and, although not crypto-specific, it helps show how blockchain developers can easily improve code quality. By having a mastery of the language, or using an auditor on their project, DeFi protocols could bring their future development funds to fruition quicker, and thus grab an outsized share of this emerging market.

Onomy Protocol’s Involvement with TLA+

To fulfill our mission of converging traditional and decentralized finance, whilst bringing Forex on-chain, our protocol’s code must be exceptional. The level we aim to achieve is accomplished through the usage of TLA+ for creating formal specifications of our software products, whilst further seeking certification by having our code and specifications audited by trusted third parties.

In doing so, not only can we deploy a diverse suite of financial products quicker, but we can also avoid black swan events and exploits that would leave our goals without basis. Trust in code is mandatory even in trustless decentralized systems, with TLA+ verification of logic delivering the golden stamp of assurance.