Braintree Payments operates dozens of PostgreSQL clusters with over 100 terabytes of data. At this scale, even a few percentage points change in disk space growth rate can meaningfully impact the writable lifespan of a database cluster. Unfortunately, many ideas to save disk space require application changes and therefore need to be slotted into product timelines.

But today I want to focus on a technique that saved us approximately 10% of our disk space with very little effort beyond existing processes. In short, carefully choosing column order when creating a table can eliminate padding that would otherwise be needed.

This technique isn’t revolutionary: it’s been well-documented by 2ndQuadrant in On Rocks and Sand, EDB in Data Alignment in PostgreSQL, GitLab in Ordering Table Columns in PostgreSQL, the classic “Column Tetris” answer on a StackOverflow question, and I’m sure many more. What I hope we’re bringing to the table is tooling encoding these ideas so that you don’t have to re-invent the wheel (or apply the technique manually).

Below I’ll describe the rules and heuristics we apply to determine an ideal column ordering. But a list of rules sounds a lot like the definition for an algorithm. And that implies a problem space we can tackle at the systems, not people, level. Instead of sending a mass email to every engineer writing database DDL changes and expecting them to remember these rules, we authored a Ruby gem called pg_column_byte_packer to automate the solution in our development cycle. We'll talk more about that soon, but first let's take a more in-depth look a the problem space.

Image for post
Image for post
Photo by Pickawood on Unsplash

Data Alignment, Padding, and Waste

PostgreSQL’s heap storage, much like fields in C-language structs, writes columns guaranteeing alignment boundaries. For example, a column having 8-byte alignment is guaranteed to start at a byte index evenly divisible by 8 (zero-indexed). The heap storage engine automatically introduces any padding necessary to maintain this alignment.

We can introspect all kinds of system behavior and objects using PostgreSQL’s catalog tables, and alignment is no exception. Each datatype is listed in pg_catalog.pg_type, and you can determine the alignment required for any data type in the typalign column of that catalog table. PostgreSQL's documentation provides an excellent summary of how to interpret this column.

What Can We Do About It?

From a high-level perspective we can minimize the amount of space that will be lost to alignment padding by ordering each table’s columns in descending order of their data type’s alignment.

For example, suppose on our 64-bit system we have a table with two columns: a bigint column (which requires 8-byte alignment) and an integer column (which requires 4-byte alignment). If we put the integer column first we'll have the following data layout that takes up 16 bytes:

However if we put the bigint column first our data layout will only take up 12 bytes:

But there are a few other cases we want to handle at the same time:

Bundled Up in a Ruby Gem

I mentioned earlier that we’ve incorporated this set of column ordering rules into our recently open-sourced Ruby Gem pg_column_byte_packer. We implemented two complementary approaches to try to solve the problem holistically.

First, we automatically patch ActiveRecord’s migration code to re-order columns on-the-fly when all of those columns are included in a single create_table (or, safe_create_table if you're using our pg_ha_migrations gem to maintain uptime guarantees when running migrations!) call.

Second, we provide an API to re-order the columns found in CREATE TABLE statements in a SQL file generated by PostgreSQL's pg_dump utility.

Beginning to use the tool in your applications now will immediately benefit new tables. But we didn’t want to stop there, because we have many existing large tables! Of course re-ordering existing columns meant we needed to re-write tables. So we created entirely new databases and applied schemas files updated using the pg_dump SQL file modification feature described above. Finally we logically replicated all data to these new databases and transparently cutover from the old databases. This is also the foundation of how we achieve zero-downtime major version upgrades in PostgreSQL, but that's a topic for a future post!

We hope many of you will benefit from our work here. And we’d also love to see any ideas you might have for improving it.

Braintree Product and Technology

Essays on design, engineering, and product development at…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store