Unlocking Performance: A Deep Dive into Table Partitioning in PostgreSQL

Published in

Simform Engineering

11 min readJan 2, 2024

Benefits, implementation, and best practices for table partitioning in PostgreSQL

Welcome to our journey into the world of PostgreSQL and its powerful feature — Table Partitioning. As databases grow in size and complexity, managing large tables efficiently becomes a crucial aspect of database administration. Table partitioning emerges as a savior, offering improved performance and maintainability for large datasets.

In this blog, we will delve into the concept of table partitioning in PostgreSQL, exploring its benefits, implementation, and best practices.

Introduction to table partitioning

Table partitioning is a database performance improvement technique that involves breaking down a large table into smaller, more manageable pieces called partitions. Each partition functions as a separate table, but collectively they form a single logical table (no physical division). This approach improves query performance, maintenance, and data management.

Why table partitioning?

Partitioning can provide several benefits:

Dramatically improves query performance in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. Partitioning effectively substitutes for the upper tree levels of indexes, making it more likely that the heavily used parts of the indexes fit in memory.
When queries or updates access a large percentage of a single partition, you can improve performance by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.
When the partitioning design accounts for the usage pattern, you can perform bulk loads and delete by adding or removing partitions. Dropping an individual partition using DROP TABLE, or doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands also entirely prevent the VACUUM overhead caused by a bulk DELETE.
Partitioning enables the database engine to scan only relevant partitions, reducing the amount of data that needs to be processed during queries.
Loading and unloading data becomes faster as operations are focused on specific partitions instead of the entire table.
Partitioning simplifies data archiving, deletion, and retention policies, as these operations can be performed on individual partitions.
Partitioning allows parallel query execution on different partitions, enhancing overall system performance.

These benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.

Deciding on a table partitioning strategy

Not all hash-distributed or randomly distributed tables are good candidates for partitioning. If the answer is yes to all or most of the following questions, table partitioning is a viable database design strategy for improving query performance. If the answer is no to most of the following questions, table partitioning is not the right solution for that table. Always test your design strategy to ensure that query performance improves as expected.

Is the table large enough? Large fact tables are good candidates for table partitioning. If you have millions or billions of records in a table, you may see performance benefits from logically breaking that data up into smaller chunks. For smaller tables with only a few thousand rows or less, the administrative overhead of maintaining the partitions may outweigh any performance benefits that you might see.
Are you experiencing unsatisfactory performance? As with any performance tuning initiative, a table should be partitioned only if queries against that table are producing slower response times than desired.
Do your query predicates have identifiable access patterns? Examine the WHERE clauses of your query workload and look for table columns that are consistently used to access data. For example, if most of your queries tend to look up records by date, then a monthly or weekly date-partitioning design might be beneficial. Or, if you tend to access records by region, consider a list-partitioning design to divide the table by region.
Does your data warehouse maintain a window of historical data? Another consideration for partition design is your organization’s business requirements for maintaining historical data. For example, your data warehouse may require that you keep data for the past twelve months. If the data is partitioned by month, you can easily drop the oldest monthly partition from the warehouse and load current data into the most recent monthly partition.
Can the data be divided into somewhat equal parts based on some defining criteria? Choose partitioning criteria that will divide your data as evenly as possible. If the partitions contain a relatively equal number of records, query performance improves based on the number of partitions created. For example, by dividing a large table into 10 partitions, a query will run 10 times faster than it would against the unpartitioned table, provided the partitions are designed to support the query’s criteria.

Do not create more partitions than needed. Creating too many partitions can slow down management and maintenance jobs, such as vacuuming, recovering segments, expanding the cluster, checking disk usage, etc.

Partitioning types

PostgreSQL provides native support for three primary types of partitioning: Range, List, and Hash partitioning.

Range partitioning

The table is divided into non-overlapping ranges based on a specified key column or columns in range partitioning. These ranges are defined by inclusive lower bounds and exclusive upper bounds. For instance, one could partition data by date ranges or specific identifiers for business objects. For example, if one partition’s range is from 1 to 10, and the next one’s range is from 10 to 20, the value 10 belongs to the second partition, not the first.

Example:

Create a range-partitioned table named measurement that is partitioned on a logdate column:

CREATE TABLE measurement (  
  city_id int not null,  
  logdate date not null,  
  peaktemp int,  
  unitsales int ) PARTITION BY RANGE (logdate);

Create partitions. Each partition should hold one month’s worth of data:

CREATE TABLE measurement_y2021m01 PARTITION OF measurement 
FOR VALUES FROM ('2021-01-01') TO ('2021-02-01');  
CREATE TABLE measurement_y2021m02 PARTITION OF measurement 
FOR VALUES FROM ('2021-02-01') TO ('2021-03-01'); 
CREATE TABLE measurement_y2021m03 PARTITION OF measurement 
FOR VALUES FROM ('2021-03-01') TO ('2021-04-01');

Any data inserted into the measurement table is redirected to the appropriate partition based on the logdate column.

2. List partitioning

List partitioning involves explicitly listing the key values that should be included in each partition. This method allows for a more direct assignment of values to specific partitions based on predefined lists.

You must declare a partition specification for every list value that you create. The following example creates a partitioned table named listpart that is list-partitioned on a color column:

CREATE TABLE listpart (id int, rank int, year int, color char(1), count int)  
PARTITION BY LIST (color);

Create the partitions:

CREATE TABLE listpart_red PARTITION OF listpart FOR VALUES IN ('r');
CREATE TABLE listpart_green PARTITION OF listpart FOR VALUES IN ('g');
CREATE TABLE listpart_blue PARTITION OF listpart FOR VALUES IN ('b');
CREATE TABLE listpart_other PARTITION OF listpart DEFAULT;

3. Hash partitioning

Hash partitioning involves specifying a modulus and remainder for each partition. Rows are distributed to partitions based on the hash value of the partition key, ensuring a balanced distribution of data.

A hash partitioned table uses a single hashable column as its partition key column. For hash partitions, you must declare a partition specification for every partition (modulus/remainder combination) that you want to create.

For example, create a table named hpt that is partitioned by the hash of the text column c:

CREATE TABLE hpt (a int, b int, c text) PARTITION BY HASH(c);

Create the partitions:

CREATE TABLE hpt_p1 PARTITION OF hpt FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE hpt_p2 PARTITION OF hpt FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE hpt_p3 PARTITION OF hpt FOR VALUES WITH (MODULUS 3, REMAINDER 2);

Insert some data into the partitioned table, and then display the number of rows in each partition:

Implementing table partitioning on an existing table in PostgreSQL

Prerequisites
Before implementing table partitioning, ensure you are using PostgreSQL version 10 or later, as earlier versions may lack some of the necessary features.

Table partitioning cannot be applied straight up to an already-existing table if the table is in a production environment; there will be some downtime required. You can follow the steps below:

Here, we will perform a table partitioning on an existing table finance.payment_details:

First, we need to rename the main table to the old table:

ALTER TABLE finance.payment_details
RENAME TO payment_details_old;

Create a new table with the same structure as the main table with partitioning applied:

CREATE TABLE finance.payment_details
(
    id bigint NOT NULL,
    payment_id bigint NOT NULL,
    device_global_key character varying(100) NOT NULL,
    transaction_datetime timestamp(4) without time zone,
    vend_amount integer,
    cash_discount integer,
    is_discount_from_device boolean,
    amount_auth integer,
    card_type character varying(100),
    card_number character varying(10),
    swipe_type character varying(50),
    created_datetime_utc timestamp(4) without time zone NOT NULL DEFAULT timezone('UTC'::text, now())
)PARTITION BY RANGE(created_datetime_utc);

Create partitions quarterly for the year 2022 on data (here, we need to make partitions for all the available date ranges in data):

CREATE TABLE finance.payment_details_y2022Q1 PARTITION OF finance.payment_details 
FOR VALUES FROM ('2022-01-01 00:00:00+00') TO ('2022-04-01 00:00:00+00'); 

CREATE TABLE finance.payment_details_y2022Q2 PARTITION OF finance.payment_details 
FOR VALUES FROM ('2022-04-01 00:00:00+00') TO ('2022-07-01 00:00:00+00'); 

CREATE TABLE finance.payment_details_y2022Q3 PARTITION OF finance.payment_details 
FOR VALUES FROM ('2022-07-01 00:00:00+00') TO ('2022-10-01 00:00:00+00'); 

CREATE TABLE finance.payment_details_y2022Q4 PARTITION OF finance.payment_details 
FOR VALUES FROM ('2022-10-01 00:00:00+00') TO ('2023-01-01 00:00:00+00');

Insert data into the partitioned table from the main table:

INSERT INTO finance.payment_details 
SELECT * FROM finance.payment_details_old;

At this point, data inserted into the “ payment_details “ table is automatically routed to the relevant partition based on the “ created_datetime_utc “ column.

PostgreSQL returns an error when data is inserted into the parent table that does not map to one of the existing partitions, for that you can make a default partition.

CREATE TABLE payment_details_default PARTITION OF payment_details DEFAULT;

In this example scenario, a new partition will be created for each quarter. You might write a script that generates the required DDL automatically.

Here is the script for creating new partitions at every new quarter. We will schedule this using pg_cron every quarter:

CREATE OR REPLACE PROCEDURE create_partition_quarterly()
LANGUAGE PLPGSQL
AS
$$
DECLARE
  _year NUMERIC = EXTRACT('year' FROM CURRENT_DATE) ;
  _quarter NUMERIC = EXTRACT('quarter' FROM CURRENT_DATE) ;
  _pattern TEXT = '_y'||_year||'Q'||_quarter;
  _allTables TEXT[] :=  '{"finance.payment_details"}';  --You can add more tables ( Main Table Name) here if there are any.
  _element TEXT;
BEGIN 
  --Create partition for each table through loop
  FOREACH _element IN ARRAY _allTables
   LOOP 
       EXECUTE ' CREATE TABLE '||_element||_pattern||' PARTITION OF '||_element||'
          FOR VALUES FROM (CURRENT_DATE) TO ((CURRENT_DATE) + INTERVAL ''3 month'')';
     RAISE NOTICE 'New partition created';
   
  -- Insert logs into partition logs table 
       EXECUTE ' INSERT INTO logs.partition_logs 
        VALUES ('''||_element||_pattern||''',CURRENT_DATE)';
   END LOOP;
END;
$$

Below is the scheduling script using pg_cron in PostgreSQL:

SELECT cron.schedule('0 9 1 */3 *',$$'CALL create_partition_quarterly();'$$)

We have also implemented logging functionality for monitoring newly created partitions:

CREATE TABLE logs.partition_logs   
 (
  partition_id BIGINT NOT NULL PRIMARY KEY,
  created_name TEXT NOT NULL,
  process_date DATE NOT NULL
 );

Congratulations on the successful implementation of table partitioning! This optimization enhances database performance, streamlining data management for improved scalability and efficiency.

You can use EXPLAIN to verify that the query optimizer scans only the relevant data to examine the query plan.

Partition maintenance

Adding a Partition:

To add to the partition hierarchy during table creation, use the CREATE TABLE … PARTITION OF command. For example:

CREATE TABLE msales_mfeb20 PARTITION OF msales 
FOR VALUES FROM ('2020–02–01') TO ('2020–03–01');

Indexing partitioned tables:

Creating an index on the key column(s) of a partitioned table automatically creates a matching index on each partition, and any partitions that you create or attach later will also have such an index.

To avoid long lock times, you can CREATE INDEX ON ONLY the partitioned table.

CREATE INDEX measurement_usls_202102_idx ON measurement_y2021m02 (unitsales);

Adding a default partition:

The DEFAULT keyword identifies a partition as the default partition. When it encounters data that falls outside of the boundaries of all partitions, the Database routes the data to the default partition.

CREATE TABLE msales_other PARTITION OF msales DEFAULT;

Dropping a partition:

The simplest option for removing old data is to drop the partition that is no longer necessary:

DROP TABLE measurement_y2020m02;

Detaching a partition:

You can detach a partition from your partition hierarchy using the ALTER TABLE … DETACH PARTITION command. Detaching a partition removes it from the partition hierarchy but does not drop the table. Detaching a partition that has sub-partitions automatically detaches those partitions as well.

For range partitions, it is common to detach the older partitions to roll old data out of the data warehouse. For example:

ALTER TABLE msales DETACH PARTITION msales_2021;

Limitations:

The following limitations apply to partitioned tables:

To set up a unique or primary key constraint on a partitioned table, all of the partition key columns must be included in the constraint’s columns, and the partition keys can’t include any expressions or function calls. The partition structure itself must ensure that there are no duplicates in different partitions because the individual indexes that make up the constraint can only directly enforce uniqueness within their respective partitions.
An exclusion constraint that covers the entire partitioned table cannot be created. Such a constraint can only be applied to specific leaf partitions. Once more, the inability to implement cross-partition limits is the cause of this limitation.
BEFORE ROW triggers on INSERT cannot change which partition is the destination for a new row.
It is not allowed to combine temporary and permanent relations in the same partition tree. Hence, if the partitioned table is permanent, so must its partitions, and likewise, if the partitioned table is temporary. When using temporary relations, all members of the partition tree have to be from the same session.

Best practices for table partitioning:

Choose an appropriate partition key: Select a column for partitioning that is frequently used in queries and provides a logical division of data.
Monitor and tune performance: Regularly monitor query performance and adjust partitioning strategy as needed. PostgreSQL provides tools like EXPLAIN to analyze query plans.
Keep partitions balanced: Ensure that data distribution among partitions is balanced to avoid performance issues. Periodically check and rebalance partitions if necessary.
Use constraints: Leverage constraints on partitioned tables to ensure data integrity and improve query optimization.
Consider overlapping ranges: When using range partitioning, be mindful of overlapping ranges to prevent data inconsistencies.
Regularly Vacuum and Analyze: Keep your tables well-maintained by running the VACUUM and ANALYZE commands to reclaim storage and update statistics.

Conclusion

Table partitioning in PostgreSQL is a powerful feature for managing large datasets efficiently. By carefully selecting a partition key and following best practices, you can significantly improve query performance, streamline data management, and optimize database operations. Understanding the principles and implementing table partitioning judiciously will empower you to scale your PostgreSQL database effectively.

Follow Simform Engineering to keep yourself updated with the latest trends in the technology horizon. Follow us: Twitter | LinkedIn