Optimizing Storage and Managing Cleanup in PostgreSQL

Harshit Dwivedi
Feb 13 · 9 min read

Transactions are an integral part of the PostgreSQL system; however, transactions come with a small price tag attached. It can happen that concurrent users will be presented with different data.
Not everybody will get the same data returned by a query.

In addition to this, DELETE and UPDATE are not allowed to actually overwrite data as ROLLBACK would not work. If you happen to be in the middle of a large DELETE operation, you cannot be sure whether you will be able to COMMIT or not.
In addition to this, data is still visible while you perform DELETE, and sometimes data is even visible once your modification has long since finished.

Consequently, this means that cleanup has to happen asynchronously.
A transaction cannot clean up its own mess and COMMIT/ROLLBACK might be too early to take care of dead rows.

The solution to this problem is VACUUM. The following code block provides you with a syntax overview:

test=# \h VACUUM
Command:
VACUUM
Description:
garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE } [, …] ) ]
[
table_name [ (column_name [, …] ) ] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [
table_name ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ]
ANALYZE [
table_name [ (column_name [, …] ) ] ]

VACUUM will visit all pages that potentially contain modifications and find all the dead space.
The free space found is then tracked by Free Space Map (FSM) of the relation.

Note that VACUUM will, in most cases, not shrink the size of a table ; instead, it will track and find free space inside existing storage files.

Configuring VACUUM and autovacuum

Back in the early days of PostgreSQL projects, people had to run VACUUM manually. Fortunately, this is long gone.
Nowadays, administrators can rely on a tool called autovacuum, which is part of the PostgreSQL Server infrastructure.

It automatically takes care of cleanup and works in the background. It wakes up once per minute (can be modified by tweaking autovacuum_naptime = 1 in postgresql.conf) and checks whether there is work to do.
If there is work, autovacuum will fork up to three worker processes (can be modified by autovacuum_max_workers in postgresql.conf).

The main question is when does autovacuum trigger the creation of a worker process? The answer to this question can again be found in postgresql.conf:

autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold =
50
autovacuum_vacuum_scale_factor =
0.2
autovacuum_analyze_scale_factor =
0.1

The autovacuum_vacuum_scale_factor command tells PostgreSQL that a table is worth vacuuming if 20% of data has been changed. The trouble is that if a table consists of one row, one change is already 100%. It makes absolutely no sense to fork a complete process to clean up just one row. Therefore, autovacuum_vacuuum_threshold says that we need 20% and this 20% must be at least 50 rows. Otherwise, VACUUM won’t kick in.

The same mechanism is used when it comes to optimizer stats creation. We need 10% and at least 50 rows to justify new optimizer stats. Ideally, autovacuum creates new statistics during a normal VACUUM to avoid unnecessary trips to the table.

Digging into transaction wraparound-related issues

There are two more settings in postgresql.conf that are quite important to understand in order to really make use of PostgreSQL. Understanding VACUUM is key to performance:

autovacuum_freeze_max_age = 200000000
autovacuum_multixact_freeze_max_age =
400000000

To understand the overall problem, it is important to understand how PostgreSQL handles concurrency.
The PostgreSQL transaction machinery is based on the comparison of transaction IDs and the states transactions are in.

Let’s look at an example :

If I am transaction ID 4711 and if you happen to be 4712, I won’t see you because you are still running. If I am transaction ID 4711 but you are transaction ID 3900, I will see you.
If your transaction has failed, I can safely ignore all rows produced by your failing transaction.

The trouble is as follows: transaction IDs are finite, not unlimited. At some point, they will start to wrap around. In reality, this means that transaction number 5 might actually be after transaction number 800 million.
How does PostgreSQL know what was first?
It does so by storing a watermark. At some point, those watermarks will be adjusted, and this is exactly when VACUUM starts to be relevant. By running VACUUM (or autovacuum), you can ensure that the watermark is adjusted in a way that there are always enough future transaction IDs left to work with.

The autovacuum_freeze_max_age command defines the maximum number of transactions (age) that a table’s pg_class.relfrozenxid field can attain before a VACUUM operation is forced to prevent transaction ID wraparound within the table.
This value is fairly low because it also has an impact on clog cleanup (the clog or commit log is a data structure that stores two bits per transaction, which indicate whether a transaction is running, aborted, committed, or still in a subtransaction).

The autovacuum_multixact_freeze_max_age command configures the maximum age that a table’s pg_class.relminmxid field can attain before a VACUUM operation is forced to prevent multixact ID wraparound within the table.

In general, trying to reduce the VACUUM load while maintaining operational security is a good idea. A VACUUM instance on large tables can be expensive, and therefore keeping an eye on these settings makes perfect sense.

A word on VACUUM FULL

Instead of normal VACUUM, you can also use VACUUM FULL.
However, it’s worth noticing that VACUUM FULL actually locks the table and rewrites the entire relation.

In the case of a small table, this might not be an issue. However, if your tables are large, the table lock can really kill you in minutes! VACUUM FULL blocks upcoming writes and therefore some people talking to your database might have the feeling that it is actually down. Hence, a lot of caution is advised.

To get rid of VACUUM FULL, I recommend that you check out pg_squeeze, which can rewrite a table without blocking writes.

http://www.cybertec.at/introducing-pg_squeeze-a-postgresql-extension-to-auto-rebuild-bloated-tables/

Watching VACUUM at work

After this introduction, it is time to see VACUUM in action. In most cases, VACUUM will not shrink your tables; space is usually not returned to the filesystem.
Here is many example that shows how to create a small table with customized autovacuum settings. The table is filled with 100,000 rows:

CREATE TABLE t_test (id int) WITH (autovacuum_enabled = off);
INSERT INTO
t_test
SELECT * FROM generate_series(1, 100000);

The idea is to create a simple table containing 100,000 rows. Note that it is possible to turn autovacuum off for specific tables.
Usually, this is not a good idea for most applications.

However, there is a corner case, where autovacuum_enabled = off makes sense.

Just consider a table whose life cycle is very short. It does not make sense to clean out tuples if the developer already knows that the entire table will be dropped within seconds. In data warehousing, this can be the case if you use tables as staging areas. VACUUM is turned off in this example to ensure that nothing happens in the background. Everything you see is triggered by me and not by some process.

First of all, consider checking the size the size of the table using the following command:

test=# SELECT pg_size_pretty(pg_relation_size(‘t_test’));
pg_size_pretty
— — — — — — — —
3544 kB
(1 row)

The pg_relation_size command returns the size of a table in bytes. The pg_size_pretty command will take this number and turn it into something human-readable.

Then, all rows in the table will be updated using a simple UPDATE statement as shown in the next listing:

test=# UPDATE t_test SET id = id + 1;
UPDATE 100000

What happens is highly important to understand PostgreSQL. The database engine has to copy all the rows. Why? First of all, we don’t know whether the transaction will be successful, so the data cannot be overwritten. The second important aspect is that a concurrent transaction might still be seeing the old version of the data.

The UPDATE operation will copy rows.

Logically, the size of the table will be larger after the change has been made:

test=# SELECT pg_size_pretty(pg_relation_size(‘t_test’));
pg_size_pretty
— — — — — — — —
7080 kB
(1 row)

After UPDATE, people might try to return space to the filesystem:

test=# VACUUM t_test;
VACUUM

As stated previously, VACUUM does not return space to the filesystem in most cases.
Instead, it will allow space to be reused; the table, therefore, does not shrink at all:

test=# SELECT pg_size_pretty(pg_relation_size(‘t_test’));
pg_size_pretty
— — — — — — — —
7080 kB
(1 row)

However, the next UPDATE will not make the table grow because it will eat the free space inside the table.
Only a second UPDATE would make the table grow again, because all the space is gone and so additional storage is needed:

test=# UPDATE t_test SET id = id + 1;
UPDATE 100000
test=# SELECT pg_size_pretty(pg_relation_size(‘
t_test’));
pg_size_pretty
— — — — — — — —
7080 kB
(1 row)
test=# UPDATE t_test SET id = id + 1;
UPDATE 100000
test=# SELECT pg_size_pretty(pg_relation_size(‘
t_test’));
pg_size_pretty
— — — — — — — —
10 MB
(1 row)

Understanding storage is the key to performance and administration in general. Let’s run some more queries:

VACUUM t_test;
UPDATE
t_test SET id = id + 1;
VACUUM
t_test;

Again, the size is unchanged. Let’s see what is inside the table:

test=# SELECT ctid, * FROM t_test ORDER BY ctid DESC;
ctid | id
— — — — — -+ — — — —

(1327, 46) | 112
(1327, 45) | 111
(1327, 44) | 110

(884, 20) | 99798
(884, 19) | 99797

The ctid command is the physical position of a row on a disk. Using ORDER BY ctid DESC, you will basically read the table backward in the physical order.

Why should you care? The reason is that there are some very small values and some very big values at the end of the table. The following listing shows how the size of the table changes when data is deleted:

test=# DELETE FROM t_test
WHERE
id > 99000
OR
id < 1000;
DELETE 1999
test=# VACUUM t_test;
VACUUM
test=# SELECT pg_size_pretty(pg_relation_size(‘
t_test’));
pg_size_pretty
— — — — — — — —
3504 kB
(1 row)

Although only 2% of the data has been deleted, the size of the table has gone down by two-thirds.

The reason is that if VACUUM only finds dead rows after a certain position in the table, it can return space to the filesystem. This is the only case in which you will actually see the table size go down. Of course, normal users have no control over the physical position of data on the disk. Therefore, storage consumption will most likely stay somewhat the same unless all rows are deleted.

In real-world applications, the impact of this observation cannot be stressed enough.
There is no performance tuning without really understanding storage.

Limiting transactions by making use of snapshot too old

VACUUM does a good job and it will reclaim free space as needed. However, when can VACUUM actually clean out rows and turn them into free space? The rule is this: if a row cannot be seen by anybody anymore, it can be reclaimed. In reality, this means that everything that is no longer seen even by the oldest active transaction can be considered to be really dead.

This also implies that really long transactions can postpone cleanup for quite some time. The logical consequence is table bloat. Tables will grow beyond proportion and performance will tend to go downhill.

Fortunately, starting with PostgreSQL 9.6, the database has a nice feature that allows the administrator to intelligently limit the duration of a transaction. Oracle administrators will be familiar with the snapshot too old error.

Since PostgreSQL 9.6, this error message is also available; however, it is more of a feature than an unintended side effect of bad configuration (which it actually is in Oracle).

To limit the lifetime of snapshots, you can make use of a setting in PostgreSQL’s config file postgresql.conf, which has all the configuration parameters needed:

old_snapshot_threshold = -1
# 1min-60d; -1 disables; 0 is immediate

If this variable is set, transactions will fail after a certain amount of time. Note that this setting is on an instance level and it cannot be set inside a session.

By limiting the age of a transaction, the risk of insanely long transactions will decrease drastically.

If you found this article interesting, you can explore Mastering PostgreSQL 11 to master the capabilities of PostgreSQL 11 to efficiently manage and maintain your database. Mastering PostgreSQL 11 can help you build dynamic database solutions for enterprise applications using the latest release of PostgreSQL, which enables database analysts to design both the physical and technical aspects of the system architecture with ease.

Coding Blocks

Daily Tidbits on Android, Javascript and Machine Learning

Harshit Dwivedi

Written by

Android Developer, has an *approximate* knowledge of many things. harshithd.com

Coding Blocks

Daily Tidbits on Android, Javascript and Machine Learning

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade