DB Size increase in Postgres: Autovacuum

Shubham Gosain
Analytics Vidhya
Published in
7 min readSep 6, 2021

PostgreSQL, also known as Postgres, is a free and open-source relational database management system emphasizing extensibility and SQL compliance. It is one of the most popular open-source database and widely used for its performance and reliability and being an alternative to Oracle’s MySQL service.

In order to improve performance, it makes use of certain mechanisms which might end up causing issues for some extreme users. One such case is increasing database size.

DB size increasing

Let’s try to imagine a case. Your application is running fine for the last 6 months with a Postgres server having DB size of 100 GB. Suddenly you are alerted that the size of the database is been double overnight… and it’s still increasing.

While checking the database, there is no such increase in table sizes. The count of rows also hasn’t increased. It’s like something else is taking away your database size without your knowledge. You have a limited disk size and you are only left with a couple of hours to fix this before getting out of disk size.

Now you have understood the problem statement and how critical it can be. So it’s better to address it beforehand until it gets too late.

The vicious feature sitting in your Postgres server taking your space is called TOAST. It’s instead a well-wisher which is trying to improve the performance of your database but can cause trouble if your database is deleting/updating toast entries too frequently.

TOAST

TOAST is a mechanism PostgreSQL uses to keep physical data rows from exceeding the size of a data block (typically 8KB). Postgres does not support physical rows that cross block boundaries, so the block size is a hard upper limit on row size. To allow user tables to have rows wider than this, the TOAST mechanism breaks up wide-field values into smaller pieces, which are stored “out of line” in a TOAST table associated with the user table.

Each table you create has its own associated (unique) TOAST table, which may or may not ever end up being used, depending on the size of rows you insert. All of this is transparent to the user and enabled by default.

When a row that is to be stored is “too wide” (the threshold for that is 2KB by default), the TOAST mechanism first attempts to compress any wide-field values. If that isn’t enough to get the row under 2KB, it breaks up the wide-field values into chunks that get stored in the associated TOAST table. Each original field value is replaced by a small pointer that shows where to find this “out of line” data in the TOAST table. TOAST will attempt to squeeze the user-table row down to 2KB in this way, but as long as it can get below 8KB, that’s good enough and the row can be stored successfully.

What makes your database suffer

A database making too many UPDATE or DELETE requests is the root cause of the problem. If you are doing these operations on large size rows, it works like a catalyst to make things worse.

In PostgreSQL, whenever rows in a table are deleted, the existing row or tuple is marked as dead ( will not be physically removed) and during an update, it marks the corresponding exiting tuple as dead and inserts a new tuple. So UPDATE operations = DELETE + INSERT. These dead tuples consume unnecessary storage and eventually, you have a bloated PostgreSQL database. This storage can be reclaimed by the VACUUM.

The reclaimed storage space by VACUUM is never given back to the resident operating system rather they are just de-fragmented within the same database page and so storage for reuse to future data inserts in the same table. To completely Reclaim space to underlying OS, VACUUM FULL is required which is slower and requires an exclusive lock on the table which won’t allow any other read/write operation.

Bloat seriously affects the PostgreSQL query performance, In PostgreSQL tables and indexes are stored as an array of fixed-size pages ( usually 8KB in size). Whenever a query requests for rows, the PostgreSQL instance loads these pages into the memory and dead rows cause expensive disk I/O during data loading. So it is highly required to run VACUUM at regular intervals.

For this, AUTOVACUUM daemon is an optional feature that automatically vacuums the database so that you don’t have to manually run the VACUUM statement. The AUTOVACUUM daemon is enabled in the default configuration. The daemon is made up of multiple processes that reclaim storage by removing obsolete data or tuples from the database. It checks for tables that have a significant number of inserted, updated, or deleted records and vacuums these tables based on the configuration settings.

Tasks of AUTOVACUUM

There are various AUTOVACUUM configuration parameters, which makes tuning complicated. The main reason is that autovacuum has many different tasks.

  • clean up “dead tuples” left behind after UPDATE or DELETE operations
  • update the free space map that keeps track of free space in table blocks
  • update the visibility map that is required for index-only scans
  • “freeze” table rows so that the transaction ID counter can safely wrap around
  • schedule regular ANALYZE runs to keep the table statistics updated

These are all the configuration parameters for AUTOVACUUM

#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------
autovacuum = on # Enable autovacuum subprocess? 'on' requires track_counts to also be on.log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and their durations, > 0 logs only actions running at least this number of milliseconds.autovacuum_max_workers = 3 # max number of autovacuum subprocesses (change requires restart)autovacuum_naptime = 1min # time between autovacuum runsautovacuum_vacuum_threshold = 50 # min number of row updates before vacuumautovacuum_analyze_threshold = 50 # min number of row updates before analyzeautovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuumautovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyzeautovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum(change requires restart)autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age before forced vacuum(change requires restart)autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for autovacuum, in milliseconds -1 means use vacuum_cost_delayautovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for autovacuum, -1 means use vacuum_cost_limit

These parameters can be tweaked as per the database personal requirement at database level or at a table level. For dead tuples cleanup, we need not to reconfigure all the AUTOVACUUM parameters.

Tuning AUTOVACUUM for dead tuple cleanup

Long-running or idle transactions can cause VACUUM to wait for holding the SHARE UPDATE EXCLUSIVE lock on the table. Unless such queries are fixed, tuning AUTOVACUUM will be useless.

If you cannot fight the problem at its root, you can use the configuration parameter idle_in_transaction_session_timeout to have PostgreSQL terminate sessions that stay “idle in transaction” for too long. This cause errors on the client-side but may be justified if you have no other way to keep your database operational. Similarly, to fight long-running queries, statement_timeout can be used.

Tuning AUTOVACUUM to run faster

If autovacuum cannot keep up with cleaning up dead tuples, the solution is to make it work faster.

Remember faster doesn’t mean running vacuum more often or running it more frequently. VACUUM is a resource-intensive operation, so autovacuum by default operates deliberately slowly. The goal is to have it work in the background without affecting normal database operations. But if your application creates lots of dead tuples, you have to make it more aggressive.

autovacuum_vacuum_threshold = The minimum number of dead tuples for vacuum to start (Default = 50)

autovacuum_vacuum_scale_factor = The scale factor to define the minimum number of dead tuples of a table w.r.t to its number of rows (Default = 0.2)

These two parameters help in control the trigger of VACUUM on a relation table. As VACUUM is a resource-consuming process, so it should only be run whenever necessary. With these parameters, a table-level minimum dead tuples count is calculated which act as threshold for VACUUM to trigger.

autovacuum_vacuum_threshold makes sure that VACUUM doesn’t run too frequently for small tables.

autovacuum_vacuum_scale_factor makes sure that VACUUM runs timely for large tables.

Min Number of of dead tuples required = autovacuum_vacuum_threshold  +  No. of Rows in table * autovacuum_vacuum_scale_factor

autovacuum_vacuum_cost_limit : Total cost limit autovacuum could reach (combined by all autovacuum jobs). (Default to vacuum_cost_limit= 200)

autovacuum_vacuum_cost_delay : Autovacuum will sleep for these many milliseconds when a cleanup reaching autovacuum_vacuum_cost_limit cost is done. (Default = 20ms)

autovacuum_max_workers : Maximum number of workers that autovacuum will spawn for doing its tasks.

These three parameters are responsible for defining the cleanup speed and time taken by of VACUUM. Autovacuum daemon sleeps for autovacuum_vacuum_cost_delay as throttling for every autovacuum_vacuum_cost_limit of work done in one go.

The autovacuum can start up to autovacuum_max_workers processes that actually perform cleanup of different databases/tables. That’s useful because you don’t want to stop cleaning up small tables until a cleanup of a single large table finishes (which may take quite a bit of time, because of the throttling). Remember increasing worker processes makes the autovacuum slow as then cost_value is shared by all autovacuum workers. Each worker process only gets 1/autovacuum_max_workers of the total cost limit, so increasing the number of workers will only make them go slower.

The default values set are pretty basic and were kept keeping the low resource systems in mind as autovaccum is enabled by default.

Final Note

Vacuum configurations can be different for different system requirements. It’s better to first understand your application database requirement, what and how many database requests it makes, and decide accordingly.

Also, don’t get straight to the conclusion that insufficient vacuum is the reason. The root cause can be something else making vacuum wait, like another database query holding a lock on the required table/row. So better to first rule out such cases.

--

--