Torturing Postgres: extreme autonomous testing for distributed architectures

4 min readApr 16, 2023

Intro

After 30 years of development, the path is clear. Multi-core architecture and multi-node deployment are critical to future scalability for databases. Adopting cloud-native architecture is also imperative to keeping Postgres competitive and relevant to the next generation of startups and enterprise needs.

Traditional Postgres development and testing has always been limited to single-node performance; does Postgres persist data correctly, handle concurrency without data corruption, and recover from system outages gracefully without losing data (YMMV on this).

At OrioleDB, we are building the next generation of storage engine architectures to help transform Postgres into a scalable and resilient data management platform, using Postgres’s Table Access Method extensibility framework. https://supabase.com/blog/postgres-pluggable-strorage

After three months of experience building and testing OrioleDB in the last phases before the beta release, we can share lessons learned from applying novel autonomous testing techniques to Postgres in the pursuit of providing a scalable, distributed architecture.

For background and perspective, you can watch this video “CMU Database Series: FoundationDB or: How I Learned to Stop Worrying and Trust the Database (Markus Pilman, Snowflake)” https://www.youtube.com/watch?v=OJb8A6h9jQQ

As Markus wryly notes in his introduction, “Testing a distributed system is probably a better experience than sticking a fork in your eye, but not by a whole lot…”

Background

Recently, we worked with Antithesis for a trial using their autonomous testing platform for exercising OrioleDB. Since its inception, OrioleDB has been built with a test-first mentality, and our team has extensive experience with automated testing frameworks. Our question was whether autonomous testing could augment our existing approach, allowing us to scale up (and improve!) our testing efforts — without drastically expanding our team.

The promise of autonomous testing is a world where our developers can work more efficiently and fearlessly, backed up by a robust test suite that provides more than traditional methods alone can provide. Our experience in this trial gave us hope that this world is possible.

I was first exposed to the concepts of autonomous testing at the FoundationDB Summit 2018 hosted at the CNCF Seattle Conference. (Will Wilson is one of the co-founders of Antithesis).

Autonomous Testing and the Future of Software Development — Will Wilson https://www.youtube.com/watch?v=fFSPwJFXVlw

The trial

To test OrioleDB with Antithesis, we first had to get our entire PostgreSQL fork and our existing test suite running on Antithesis’s platform. Antithesis works by creating a multiverse of program states, then intelligently exploring that multiverse by taking note of “interesting” paths and running further simulated test loads from there. Antithesis used our existing test suite and workloads to exercise the database and provided simulated network conditions to create stress conditions for OrioleDB. Antithesis’s approach combines network simulation, fuzzing, automated tests, and other techniques, all within their simulated multiverse framework. Importantly, Antithesis’s multiverse means that bugs, once found, can be deterministically reproduced, and program states leading to failures can also be examined.

The trial turned up three different classes of bugs within OrioleDB:

Errors within our existing automated testing.
Failed assertions that indicate a violation of an exception or system guarantee.
More complex bugs within our Postgres fork could have resulted in invalid or partial data writes.

Debugging with Antithesis

Antithesis’s platform also provided us with debugging tools that, combined with their ability to replicate the bugs deterministically, significantly sped up the debugging process. The capacity of Antithesis to provide core dumps from any given state of the testing multiverse was beneficial for analyzing the various bugs.

Additionally, Antithesis has a tool called interactivity which enables us to load up a state from the multiverse and directly inject commands — or even run a new autonomous campaign from that point. Guest command injection through interactivity aided in debugging the complex bugs found through simulated network conditions.

The advantages of autonomous testing

Autonomous testing provided distinct advantages over traditional manual and automated testing methods.

Manual testing is time-consuming and error-prone; manual testers cannot cover many failure conditions, leaving many errors unsurfaced. Manual testing also creates high latency between the introduction of a bug and its discovery because of its various inefficiencies.

While this process can run automated testing without human guidance and thus run much more frequently, it is still limited (compared to autonomous testing) by only testing conditions the developer of the independent test thought of. Further, automated testing requires a heavy administrative overload in deploying and cataloging the testing efforts. Computerized tests also fail to identify interesting states along the test run, which are likely to result in bugs, and cannot run multiple types of tests and faults from those points to explore the interesting state. Antithesis’s multiverse enables dynamic exploration of program states.

A further advantage of autonomous testing with Antithesis is that the multiverse model allows for far more testing hours. Since many simulations are being run in parallel, our automated test suite is being exercised far more often than it would be otherwise, which is valuable even in the absence of intelligent exploration or the ability of Antithesis to simulate esoteric network conditions.

Looking to the future

What, then, does autonomous testing mean for OrioleDB?

As we move OrioleDB into being a foundation for a distributed version of PostgreSQL, the Antithesis suite allows us to exercise and test the system far more efficiently. It also cuts down on the number of engineering hours tied up in running and analyzing tests: Antithesis not only finds bugs autonomously, but it also provides powerful debugging tools and historical analysis, which empowers our engineers to understand and address bugs rapidly.

If you are already familiar with Postgres and are curious about how to contribute to the future development of the Postgres community edition, come and join us and learn how to plan for this future. Explore and ask questions at our public OrioleDB repo https://github.com/orioledb/orioledb.

Torturing Postgres: extreme autonomous testing for distributed architectures

Written by think(x)