AlloyDB Adaptive AutoVacuum and how AlloyDB Cluster Storage Space is Released.

Somdyuti
Google Cloud - Community
7 min readAug 2, 2024

--

What this Blog is About?

All PostgreSQL users including DBAs and Developers know the importance of AutoVacuum- the problems it solves. But there are also a few challenges with AutoVacuum. I will not be discussing about AutoVacuum and why it’s important- there are numerous insightful articles on it. In this blog, I will discuss the challenges in AutoVacuum and most importantly how AlloyDB, Google’s Flagship PostgreSQL database, solves it. I will also show how AlloyDB releases storage space when Database used space comes down.

Challenges with AutoVacuum

The when(minimum delay between AutoVacuum runs), what(checks AutoVacuum will do to remove dead tuples), how(much resources AutoVacuum can use) of AutoVacuum are primarily determined by these 7 parameters as below:-

autovacuum_naptime- Specifies the minimum delay between autovacuum runs on any given database. In each round the daemon examines the database and issues VACUUM and ANALYZE commands as needed for tables in that database. Default is 60s.

autovacuum_vacuum_scale_factor- Specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a VACUUM. The default is 0.2 (20% of table size)

autovacuum_vacuum_threshold- Specifies the minimum number of updated or deleted tuples needed to trigger a VACUUM in any one table. The default is 50 tuples.

autovacuum_analyze_scale_factor- Specifies a fraction of the table size to add to autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. The default is 0.1 (10% of table size)

autovacuum_analyze_threshold- Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples

autovacuum_vacuum_cost_delay- The amount of time that the process will sleep when the cost limit has been exceeded. Default is 2 ms.

autovacuum_vacuum_cost_limit- The accumulated cost that will cause the autovacuuming process to sleep. Default is 200

Also, autovacuum_max_workers sets the maximum number of simultaneously running autovacuum worker processes

The database workload will be different for different tables. Few tables will have more updates than others and again few tables can have more deletes, bulk deletes than other tables. The AutoVacuum worker processes need memory and resources and the availability of server resources will vary based on changing database workload. If we set these parameters to their default values and do not change or adjust them for large dynamic tables that can lead to table fragmentation and performance issues. So the above 7 parameters need to be adaptive at table level, resources allocated to Vacuum worker processes need to be dynamic for most efficient AutoVacuum runs and that’s the challenge.

So the challenges are:-

  1. Which tables shall we change the AutoVacuum default values?
  2. What will be the non-default values for these AutoVacuum parameters that need to be set in the identified tables or vacuuming frequency of the large and frequently updated tables be?
  3. How to adjust these parameters based on the current Database workload and Server Resource Utilization?

Here AlloyDB wins. You do not need to worry. AlloyDB’s Adaptive AutoVacuum takes care of the above 3 challenges based on your workload. This means most of the 7 AutoVacuum parameters that I have mentioned are adaptive in nature. They get adjusted and set based on your workload, behind the scenes, and makes AutoVacuum optimal. This is very similar to Automatic Memory Management which few RDBMS provides-incidentally AlloyDB also provides Automatic Memory Management! (More on this in another blog)

If you want to learn more about Adaptive AutoVacuum,refer to this Blog-https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-under-the-hood-adaptive-autovacuum?e=48754805

The above Blog mentions how Adaptive AutoVacuum adjusts AutoVacuum workers and cost limit based on the available resources in the Cluster.

AlloyDB’s Adaptive AutoVacuum Adjusts to the Resource Available in the Cluster

In rest of the Blog, I will give a real-world example of Adaptive AutoVacuum in action and most importantly how AlloyDB releases storage!

Adaptive AutoVacuum in Action

I will demonstrate this using pgbench_accounts table which is 10GB in size. 80 million rows were Inserted and then few Updates were done.

pgbench_accounts table.
  1. AutoVacuum did not run so far as primarily INSERTs have happened in this table which does not create dead tuples and the changes due to few updates where much less than autovacuum_vacuum_scale_factor to trigger AutoVacuum.
  2. Delete all rows in the table. The same could have been demonstrated by deleting bulk of the rows in the tail end of the table.
  3. There are Dead Tuples in “pgbench_accounts” primarily due to this Bulk Delete and few caused by earlier Updates.
  4. We can see the delete completed at 20:30 BST which is 21:30 UTC
Bulk Delete

5. Verify AlloyDB’s Adaptive AutoVacuum is turned On.

AlloyDB Adaptive AutoVacuum Turned On

6. Auto-Vacuum kicks in within 1 minute after Delete completes(check last_autovacuum column) and it runs in few seconds to clean 80 Million dead tuples! This is because AlloyDB could adaptively and dynamically allocate more resources to the AutoVacuum Worker processes based on the current resource utilization of the server(behind the scenes it adjusted autovacuum_vacuum_cost_limit,autovacuum_work_mem). AutoVacuum activities are logged in AlloyDB. As seen below, it took only 37 seconds to remove the 80 Million dead tuples and also truncated/removed the Pages (HWM reduced which releases space at OS- database shrinks). One can notice the high values for Average Read and Write rates AlloyDB allocated to AutoVacuum processes.

AutoVacuum Log when Database Load is less

After AutoVacuum completion we can see there are no Dead Tuples and since we deleted all rows in the table there are no Live Tuples either. The table size is also 0 as all pages were truncated.

pgbench_accounts Truncated at end of AutoVacuum
AutoVacuum Kicks-In.

When the database workload is high and the server is heavily loaded, the same AutoVacuum activity took more than 3 minutes. It removed the dead tuples in 3 runs (as AutoVacuum releases lock as soon as there is request for conflicting locks), but it could not truncate the pages as it was not able to acquire Access Exclusive Lock on the table. We can also notice the reduce Average Read and Write Rates for the AutoVacuum processes.

AutoVacuum Log when the Server is Busy

This is the magic of AlloyDB’s Adaptive AutoVacuum! It adapts AutoVacuum and the resources it allocates to AutoVacuum workers based on the server workload.

7. Since all rows were deleted, there were empty blocks at the end of the table which AutoVacuum releases to the OS and this free space is seen at the database layer.

Note:- AutoVacuum acquires ‘ShareUpdateExclusiveLock’ on the table. Any empty blocks in the tail end of the file will be truncated.But even if there is a single live row at the end, the file can not be truncated (This is OSS PostgreSQL behavior). A normal vacuum or Autovacuum can take an Access exclusive lock(basically to lower High Water Mark) to remove the empty blocks if those empty blocks are at the end of the table and there is no live tuple in those blocks. Released empty blocks (as a result of this Truncate at the end of Auto-Vacuum) frees up space in the database. If AutoVacuum cannot truncate those empty blocks(due to failure of obtaining Exclusive Lock on the table), they are marked as empty for future updates on the same table but space is not released at Database layer. Also , by executing, alter table pgbench_accounts set (vacuum_truncate=false,toast.vacuum_truncate=false) will prevent AutoVacuum to execute truncate at the end (it will only acquire ‘ShareUpdateExclusiveLock’ ) and space will not be released.

In OSS PostgreSQL that reclaimed space is released to the OS and we see a reduction in Database used space but disk usage is still not reduced. This gives a disparity on the actual Database space used and storage allocated. This has been a classic problem.

Here comes the Magic of AlloyDB!

AlloyDB Storage Reduction in Action

But in AlloyDB the Cluster storage reflects the actual Database size. The delete was done at 8:30 PM BST which is 9:30 PM UTC. We can see the Cluster storage was 19GB.

AlloyDB Storage Usage during the course of Delete and AutoVacuum

As AutoVacuum was able to acquire lock and release those empty blocks, Disk storage shrunk by 10GB in the next 15 minutes to reflect the actual Database usage. This all happens online without any performance or availability issue in the Database.

AlloyDB Storage online shrink

What’s Next

Adaptive AutoVacuum is just one of the auto-pilot features that AlloyDB provides. This makes Database management automatic and super-easy and efficient. The other Auto-pilot features, which are made possible by Advanced Machine Learning algorithms, are:-

Automatic memory management

Automatic storage tiering

Automatic data columnarization and query rewrite

I will be discussing about these, with real world examples, in upcoming blogs. Interested to get started with AlloyDB?

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Somdyuti
Somdyuti

Written by Somdyuti

A Database Learner for 21 years. DM Specialist in Google helping customers to migrate and modernize their databases to GCP Databases-CloudSQL, AlloyDB, Spanner

Responses (1)