Why use autovacuum for PostgreSQL database

Know the importance of auto-vacuum in database and also how to set it up properly

Ashutosh Karna
Analytics Vidhya
4 min readApr 21, 2020

--

As a database developer or administrator, you always want your database to be at it’s best and offer optimum performance. We often face a situation when we feel like our database is not performing up to the expectation and is bit slow, and we often opt to switch from RDBMS to NoSQL Database to overcome this, but this is not the final solution. Here we are missing one point: when database gets older and older, bloat is created. Bloat is created because of dead tuples resulted from database delete and update operations. When we perform delete or update operation on rows of database tables, actually we think older data after these operation is gone forever but they still remain in the database as dead tuples unless cleaned. This may cause several issues in database. Apart from consuming extra unused space, this also slows down queries. This is when vacuum comes into play.

Vacuum basically is the process of cleaning these dead tuples which might cause bloat if left alone. Vacuum can be done manually with some sort of command, but we are interested in a daemon called auto-vacuum which triggers itself when required making our life much easier. But, there is a catch in this, the parameters needs to be set properly so that it can work out of box. In this article, we will focus on following topics:

  1. Checkout the parameters related to autovacuum
  2. Changes to be made to setup autovacuum correctly
  1. Checkout the parameters related to autovacuum

There are few things related to autovacuum which we should know about before proceeding forward.

a. Check the number of dead tuples (rows) in each table in your database tables:

The above query will give you the number of dead and live tuples in corresponding tables. If the number of dead tuples are very high, then probably vacuum have never ran in your database and you are in serious trouble.

b. Check when last vacuum and autovacuum ran in your database tables:

Using this query, you will get the bigger picture of status of vacuum in your database. If you see that last_autovacuum has been ran for long time then you need to tune your parameters, which we will discuss in next section.
Note: last_vacuum is the timestamp when last manual vacuum ran in given table.

c. Check autovacuum is turned on or not, though by default it is on:

d. Check the values of parameters set in configuration file related to autovacuum:

Just run this and check the values. We will discuss about these parameters in detail in next section.

2. Changes to be made to setup autovacuum correctly

Now that, we have checked the status of autovacuum in our database tables, it’s time to work for calculating and setting correct values so that autovacuum will work properly. Let’s understand some the parameters related to autovacuum in short:

a. autovacuum_vacuum_scale_factor: Specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a VACUUM. The default is 0.2 (20% of table size).

b. autovacuum_vacuum_threshold: Specifies the minimum number of updated or deleted tuples needed to trigger a VACUUM in any one table. The default is 50 tuples.

c. autovacuum_max_workers: Specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that may be running at any one time. The default is three.

For each table, the threshold value which can trigger autovacuum is calculated as:

The default values for this parameter might be good enough for test environment but for production environment where bloating might be high, this is completely ineffective. It might be the case that if haven’t changed these default values, autovacuum might not have ran once also. Autovacuum is triggered only if the number of dead tuples in given table crosses the vacuum_threshold value calculated from above expression. So, if you have table with one million rows, and autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor are set 50 and 0.2, then autovacuum will only be triggered if dead tuples in this table is more than 200k.

Let’s checkout what values for these parameters would give better result and run autovacuum most efficient way:

You change these parameters in configuration file of postgres, i.e. postgresql.conf or set different values for different tables according to their sizes and frequency of update operations on it using following type of query:

Once, you have set up correct values, monitor this and iterate on values until you find perfect one. You can check if your autovacuum ran or not for each table and when it ran last time using the query that I have mentioned in previous section of this article.

I hope you will find this article useful. Thanks for reading. Comments and suggestions are welcome.

--

--