Dealing with significant Postgres database bloat — what are your options?
Many scenarios can lead to an unhealthy Postgres database — some of which cannot be cleaned up by the built-in Postgres autovacuum daemon.
This article assumes you have some understanding of Postgres autovacuuming, so if that’s new to you, it’s probably better to start here.
We’ll explore your options when you discover your database has serious bloat problems (think 10s to 100s of GB of bloat), and you need to resolve it quickly. This incident recently happened to us at Compass, after we discovered some code that was updating rows in a high read/write traffic table much more often than it should have been, and how we cleaned up the mess.
What is “bloat”? How much do I have?
Before we go any further, it’s essential to understand what bloat is in the first place. It’s most likely what lead you to this article, but understanding how it occurs in the first place is worth knowing so you can prevent it before it starts.
When an existing record is updated, it results in a dead tuple, the previous version of the record, as well as a new record. When the rate of dead tuples being created exceeds the database’s ability to clean up dead tuples automatically, bloat occurs. Let’s imagine a scenario where an application is updating or deleting records at an average of 1,000 records per second. The autovacuum daemon is removing dead tuples at an average rate of 800 per second. Each second, 200 tuples of bloat will occur. Keep in mind this is a hypothetical scenario — it’s impossible to tune the autovacuum daemon to remove dead tuples at 800/sec.
You can see how badly your database is suffering from bloat using the following command with pgextras:
Additionally, there are some helpful queries to identify hotspots in your database — note you may need to change the schema
depending on how your database is configured:
What problems does bloat cause?
When a database table is suffering from bloat, query performance will suffer dramatically.
Consider this query fingerprint (modified) taking upwards of 14 seconds while table_b
was suffering from severe database bloat.
In Postgres, the query planner is responsible for determining the best execution plan for a query. The planner must consider aggregate table statistics, the indices on the table, and the type of data being queried. The planner will then recommend a certain execution path to get the data in the quickest, most resource-efficient way.
When a table is bloated, Postgres’s ANALYZE tool calculates poor/inaccurate information that the query planner uses. Consider the case when a table has 350 million dead tuples, but only 50 million active rows. The bloat score on this table is a 7 since the dead tuples to active records ratio is 7:1. This score is exceptionally high, so when the query planner tries to query this table, it gives horrible instructions, leading to slow queries (because they use an inferior index, for example).
Imagine asking for directions in your parent’s hometown, and they’re using a mental model of their hometown from 30 years ago. Tons of stuff has changed, so their directions are only partially correct. You may eventually get there, but it’s going to be a winding, slow, frustrating trip.
After removing the database bloat in this example, the query took 37ms to complete, a reduction of 99.7% in execution time.
How does database bloat typically get cleaned up?
Typically, Postgres’s autovacuum daemon handles regular cleaning of this data. Postgres’ default is when the dead tuples in the table represent 20% of the total records. Numerous parameters can be tuned to achieve this. Still, specific values depend on your database’s workload and your business rules for acceptable performance. This is especially true since the autovacuum process consumes resources that would otherwise be used for regular operation (think CPU/memory/disk IO).
When bloat can’t automatically be cleaned up, what can you do?
Ok — the reason you’re all here. When you are in a situation where bloat accumulates faster than the database can clean it up, the first and most urgent step is to resolve the root cause of the bloat creation. This root cause may be an over-zealous background job that’s updating records far too frequently or a lack of rate limiting, but ultimately is something specific to your application.
Once you’ve stemmed the bleeding on this front, the next stage is to repair the damage. Unfortunately, when you have table bloat in the 5+ range for a large table (10–100+ GB), the regular VACUUM ANALYZE VERBOSE table_name_here;
command is going to take a prohibitively long time (think 4+ days, or even longer).
We identified 3 potential options
- VACUUM FULL. Pros: can be faster than VACUUM ANALYZE and will eliminate all bloat in the table while also shrinking the table’s physical size on disk. Cons: Requires an exclusive read/write lock on the table for the operation’s duration, which can cause application outages depending on the table. Imagine not being able to access your
users
table temporarily — a non-starter. - pg_repack. Pros: Very fast, and doesn’t require a read/write lock on the table. Cons: Very resource-intensive, which can degrade overall database performance. It also can cause significant replication lag, as well as OOM errors when using replication slots (details below).
- pgcompacttable. Does the same thing as pg_repack, but modifies the rows in place. I won’t go into detail here, but it uses significantly fewer resources and operates much slower, so it’s not as problematic for replication slots. However, in our case, pgcompacttable was far too slow to remove the amount of bloat we had.
We decided to go with pg_repack, and pay the brief performance penalty. No downtime, and was the quickest path to resolution.
How does pg_repack work?
At a high level, pg_repack takes the following steps in order to remove all bloat from a table without blocking read/writes from the table.
- Create a duplicate, empty table of the table suffering from bloat without indices. Important for loading data into the new table in a performant manner, much faster without indices.
- Create a log table to capture any changes that occur as pg_repack is doing its work, which can sometimes take multiple hours.
- Create triggers on the original table to capture the delta and insert it into the log table while the process is running.
- Copy data from the original data into the new table.
- Create indices on the new table that were present on the old table after all data has been loaded.
- Replay data from the log table into the new table.
- Rename the old table out of the way (very fast).
- Rename the new table into place (very fast).
- Drop the old, now-renamed table.
What you’re left with is a brand new table with no bloat! Mission accomplished!
Replication slots and OOM errors
This particular piece is critical to consider if you’re using replication slots. Because the above process creates significant changes to the database (WAL / replication lag), the amount of data that must flow through replication slots is prohibitively large if you’re repacking tables in the 50–100+ GB range, unless you have that much free memory.
In our case, we were replicating data into a Redshift instance using Fivetran. When Fivetran tried to read data from the slot during the next sync, Postgres has to provide all 100GB of data because we changed it rapidly. The database then runs out of memory, and a fire drill ensures.
We used the following process to avoid this scenario:
- Remove pg_replication_slot used by Fivetran that was causing the database to crash during pg_repack of large tables.
SELECT pg_drop_replication_slot('fivetran_replication_slot');
2. Repack the affected tables to remove severe table bloat.
3. Add back the replication slot.
SELECT pg_create_logical_replication_slot('fivetran_replication_slot', 'test_decoding');
4. Trigger a full resync in Fivetran, which can, unfortunately, take multiple days depending on the size of your data.
Performance back to normal!
After performing the above steps, we removed the severe database bloat from our tables and returned performance to normal without causing any downtime. Hopefully, these learnings can help you or your team tackle any similar situations you may be running into!
Remember — these steps are the last resort. Ideally, your autovacuum settings are tuned to keep up with the bloat your application creates. As we all know, things will go wrong, so these steps should help you in a disaster scenario.