Partitioning Postgres Tables

In this post, I will describe what table partitioning is and how we can create & manage table partitions. We, at housing.com, are using the pg_partman extension for partitioning large Postgres tables.

PostgreSQL

Suppose, there is a transactions table in your database which is being used to store all the transactions done by users on your platform. After a time this table will become so big that it will start affecting your query timings. Also, this is the kind of table where keeping old data(older than a certain period) doesn’t make sense since your users might only be interested in the past few weeks or months' transactions. So you should target optimizing your queries for that data. But what does this has to do with our topic? Well, this is where table partitioning comes into the picture. In this post, I’ll be using a transactions table and then splitting it into smaller tables having monthly data. I’ll also be showing how to remove tables having old data.

What is table partitioning?

Partitioning can be defined as the splitting of one logically large table into smaller tables. These new smaller tables are called partitions and the table which is being split is called a partitioned table.

Basic idea is that you have a parent table that you can send queries to, but the data lives in many smaller child tables. Since fewer rows have to be read from using smaller child tables, this helps us in improving our queries’ performance. Also, once we do not need the data from a child table, we can just drop it or send it to the archive data store of our choice.

Starting with postgres version 10, Postgres uses declarative partitioning to implement table partitioning. This is also known as native Postgres partitioning. For older versions, we had to use triggers to implement partitions.

What is pg-partman?

Pg-Partman(Postgres Partition Manager) is a Postgres extension that helps us in creating and managing Postgres table partitions. It supports both native and trigger-based partitioning. In this post, I’ll be using native partitioning for the example.

PG Partman also recommends using native partitioning for Postgres 10+ because performance in both reads & writes is significantly better in native partitioning.

Partitioning Example Walkthrough

Suppose we have a hypothetical transactions table in the partman_example database, which will be having some data for transactions for two-three months. We’ll be partitioning this transactions table into monthly partitions. In this example, I’ll be using native partitioning based on timestamp range.

transactions table
Data in the transactions table

If you are managing your own Postgres server then you need to install pg-partman first. More about the installation can be found on pg-partman documentation.

If you are using AWS RDS, the pg_partman extension is supported on RDS for PostgreSQL versions 12.5 and higher. So you just need to enable the extension.
Other major cloud providers also have support for this extension.

Now let's enable the extension:

Enabling the pg_partman extension

Now, we’ll be creating a new partitioned table transactions_partitioned which will be the parent table for all our partitions.

creating a new partitioned table

Notice, I have not mentioned id as the primary key here because we can’t. pg_partman does have a mechanism to still apply primary/unique keys that are not part of the partition column. Just be aware that they are not enforced across the entire partition set, but only for the individual partition. This is done with a template table. To ensure that all new partitions have these keys, we need to create this template table first and add the primary key on the id column.

Creating template table

Now, Let’s create partitions set.

Creating partitions set

Here, I have used pg-partman’s create_parent function which is the main function that creates partitions set for a partitioned table. In the above method, I have mentioned transactions_partitioned as the parent table, created_at as the column that the partitioning will be based on, native as my partitioning type, and monthly as my partitions interval. Also, I have passed transactions_table_template as my template table for all the child partitions, premake value as 1 which defines how many partitions it will stay ahead of the current partition. Also, in start_partition I have passed next month’s timestamp to create a partition starting with the next month because we’ll be making our original transactions table the default partition(i.e. partition which have data that doesn’t satisfy any other partitions constraints) for the new partitioned table but Postgres will not be allowing this as it’ll be having data of current month resulting in conflict with the constraints of current month’s partition. For example, if I wouldn’t have changed the start_partition, then it would have created a partition for the current month But the current month’s data was already present in the transactions table, so it would have thrown an error when making the transactions table as default partition.

As you can see in the image below, a partition for next month has been created for the partitioned table transactions_partitioned.

Partition Set for the new partitioned table

Note: Postgres will be creating these timestamp range-based partitions using the Postgres server’s timezone. In this example, my Postgres server had the IST timezone. It’s recommended to use the same timezone in all your services and Postgres server.

Now, In partman config, update the parent table name to that of our original table and premake value to a higher value to create some extra partitions in advance. Here, I have updated it to 4, so that whenever I’ll be running a maintenance command to create new partitions, It’ll create 4 extra partitions. we’ll see this in the next few steps.

Updating the parent table in part_config

Now, the next step would be renaming the partitioned table and our original table. This step will be having many sub-steps, so we’ll be doing this inside a transaction. First, we’ll be taking ACCESS_EXCLUSIVE lock on both the original transactions table and transactions_partitioned table. Then we’ll be renaming the transactions table to the transactions_default table which will act as the default partition for our partitioned table. Then, we’ll rename our partitioned table transactions_partitioned to our original table name i.e. transactions. After that, we’ll have to rename the partition which has already been created i.e. transactions_partitioned_p2022_04 to transactions_p2022_04. Post this, we’ll attach the transactions_default table to the transactions table as the default partition.

Renaming tables

Now, we can migrate the data from our default partition to new partitions. For that, we’ll be using partman’s partition_data_proc function, which will move data from the default partition(transactions_default) to the correct partition in batches.

Migrating the data

Notice, It asks to run VACUUM ANALYZE on the parent table. It is because partman creates a lot of temporary tables in the migration step.

Now, let's see what our transaction table looks like after the partitioning.

transaction table after partitioning

See, Now it has 5 partitions attached to it. Three of them will be having the existing data, 1 partition will be for the next month and 1 partition will be the default partition which will contain data that’ll not match any of the already existing partitions conditions.

Notice one thing, our transactions table does not have id as the primary key. But we mentioned id as the primary key in our template table. Well, that’ll create id as the primary key only for the partitions table as data resides there only. We can verify the same with any of the partitions:

id as primary key in a partition table

Let's verify the data also:

Data in partition tables

Now as we can see data resides in their respective partitions. We have successfully partitioned our transactions table data.

Postgres do not automatically create new partitions. So when it’ll get data for the month in which there isn’t any partition, it’ll send that data to the default partition but we don’t want that. So for this, we’ll have to create some partitions in advance. We can do that by running the run_maintenance function. We can run this function using some cron service of our choice or using the background worker provided by pg-partman.

New partitions in advance in the transactions table

Notice, that after running the maintenance command, our table now has 4 extra partitions. This is because we had the premake value set to 4 in our part_config table of partman. So, it created 3 new partition tables.

One final thing, suppose we want to keep only the last 1 month's data in our main parent table but do not want to delete the old data as we may be archiving it to our archival store. For that, we have to update our part_config table and set retention to 1 month and retention_keep_table to true. After that, the maintenance command will take care of it. It’ll detach the partitions having data older than 1 month.

partition retention

Notice that now our transactions table doesn’t have the transactions_p2022_01 partition. It has been detached from this table but not deleted. we can verify the same below:

Final tables

Conclusion

In this article, we learned what table partitioning is and how pg-partman helps us in creating and managing the partitions of already existing large tables. Here I have tried to give an overview and basic understanding of partitioning. More details can be found in Postgres and Pg-Partman's official documentation.

Thanks for reading!

--

--