Efficiently RePartitioning Large Tables in PostgreSQL
By Cagri Biroglu, Database Engineer
Introduction
Partitioning a massive table that already holds data can be challenging, especially when the initial table, or “Mammoth” partition, has all or most of the data. For instance, partitioning an existing 10TB orders table into smaller, monthly partitions requires a strategy that minimizes downtime and efficiently loads data into the new partitions. In this blog post, I’ll explain how we managed such a partitioning scenario in PostgreSQL, leveraging a three-step process to create, load, and switch partitions to have maximum control.This workaround helped us clean data across different database shards, totaling approximately 100TB — all coming from a single table. It became a valuable part of our data archival strategy, particularly for append-only tables where the data does not change after being written.
Why Repartition?
- Performance Improvements: Our partitioning journey makes querying and indexing more efficient by limiting the amount of data scanned during operations. Smaller, more focused partitions — which we will mostly cover in this blogpost — improve read performance and reduce the burden on system resources (I/O, Vacuum, etc..)
- Maintenance Made Easy: With partitioned tables, operations like archiving, vacuuming, and purging old data become more manageable.
Why Not Use Existing Tools?
- There are some tools like
pg_partman
andpg_repack
that are popular for managing PostgreSQL tables and partitions. However, these tools are not designed for the specific challenge of splitting an already massive, single partition into smaller, more manageable partitions. As such, we opted for a custom approach that provided the necessary control and efficiency.
Our Working Strategy for Partitioning Existing Tables
You can read through our previous blog post — Table Partitioning at Adyen — for more in-depth details. To partition existing tables effectively, we followed a straightforward strategy:
- Rename the Existing Table: Rename the existing table to
<table>_mammoth
. - Create a New Table: Create a table with the original table name.
- Add Original as First Child: Add the renamed original table (
<table>_mammoth
) as the first child. - Add Additional Partitions: Add at least one more partition to accommodate the incoming data.
Overview of the RePartitioning Process
Once we set up the initial structure, we use the following process to split the mammoth partition:
- Create Empty Partitions: Define and create new partitions to replace the mammoth.
- Load Data: Load data from the mammoth partition into the newly created partitions in batches.
- Switch Partitions: Detach the mammoth partition and attach the new ones to complete the partitioning.
This approach avoids overwhelming resources by batching the data load, ensuring ownership and integrity constraints remain intact. In the following sections, we’ll detail each of these steps. While developing these steps for one of our largest tables, I want to give credit to Shailendra Chauhan for his significant contributions to this work.
Step One: Create Empty Partitions
In this initial step, we must lay the groundwork by defining and creating new partitions. The goal is to replace the mammoth partition with smaller, manageable segments, each representing a specific range of invoiceId
. This step ensures that subsequent data loading and switching can proceed in an organized manner, limiting resource usage and improving system performance. First, we need to create partitions with boundaries that allow data to be split into manageable parts. In our case, each new partition covers a specific range of transaction IDs (Invoiceids
) within the invoices
table. To demonstrate this in a relatable way, let’s use an invoices
table that represents child records of orders
stored in an orders parent table.
CREATE OR REPLACE PROCEDURE create_empty_invoices_partition()
LANGUAGE plpgsql
AS $$
DECLARE
tableowner_for_invoices VARCHAR;
cut_off_invoiceId BIGINT;
mammoth_upper_boundary_invoiceId BIGINT;
mammoth_min_invoiceId BIGINT;
partition_size BIGINT := 1000000000;
current_partition_upper_boundary_invoiceId BIGINT;
current_partition_lower_boundary_invoiceId BIGINT;
total_no_of_partitions BIGINT;
partition_no BIGINT;
BEGIN
-- Set the role to the owner of the invoices table to ensure consistent ownership.
SELECT tableowner INTO tableowner_for_invoices FROM pg_tables WHERE tablename = 'invoices';
EXECUTE format('SET ROLE %s', tableowner_for_invoices);
-- Calculate cut-off transaction ID for X years of data.
SELECT MIN(invoiceId) INTO cut_off_invoiceId FROM orders
WHERE creation_date = (
SELECT MAX(creation_date)
FROM orders
WHERE creation_date <= NOW() - INTERVAL 'X years 1 day'
AND creation_date > NOW() - INTERVAL 'X years 2 days'
);
-- Determine the upper boundary transaction ID for the mammoth partition.
SELECT (regexp_matches(
pg_catalog.pg_get_expr(child.relpartbound, child.oid),
'.*\(\''?(.*?)\''?\).*\(\''?(.*?)\''?\).*'))[2]::BIGINT
INTO mammoth_upper_boundary_invoiceId
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE LOWER(parent.relname) = LOWER('invoices')
AND LOWER(child.relname) = LOWER('invoices_mammoth');
-- Determine the minimum invoiceId in the mammoth partition.
SELECT MIN(invoiceId) INTO mammoth_min_invoiceId FROM public.invoices_mammoth;
-- Calculate the total number of partitions to create.
total_no_of_partition := ((mammoth_upper_boundary_invoiceId - cut_off_invoiceId) / partition_size) + 1;
-- Create partitions based on the calculated boundaries.
FOR partition_no IN 1..total_no_of_partition LOOP
current_partition_upper_boundary_invoiceId := mammoth_upper_boundary_invoiceId - ((partition_no - 1) * partition_size);
current_partition_lower_boundary_invoiceId := GREATEST(current_partition_upper_boundary_invoiceId - partition_size, cut_off_invoiceId, mammoth_min_invoiceId);
EXECUTE format(
'CREATE TABLE public.invoices_%s_%s (LIKE public.invoices INCLUDING ALL)',
current_partition_lower_boundary_invoiceId, current_partition_upper_boundary_invoiceId
);
EXECUTE format(
'ALTER TABLE public.invoices_%s_%s ADD CONSTRAINT Partition
CHECK ((invoiceId >= %s::BIGINT) AND (invoiceId < %s::BIGINT))',
current_partition_lower_boundary_invoiceId, current_partition_upper_boundary_invoiceId,
current_partition_lower_boundary_invoiceId, current_partition_upper_boundary_invoiceId
);
END LOOP;
END;
$$;
Step Two: Load Data into the New Partitions
The load_data_in_invoices_partition()
procedure migrates data from the mammoth partition to newly created partitions. This migration occurs in batches to ensure the process is manageable and does not overwhelm system resources. Each batch is inserted into the appropriate partition based on the specified invoiceId
boundaries.
- Batch Size: The batch size is defined as 100,000 rows, which means we load the data in manageable portions rather than all at once. This minimizes the load on the system and helps avoid potential locks or performance issues.
- Renaming Partitions: The new partitions are renamed to be part of the
invoices
table. - Loop for Loading Data: The data loading process is executed in a loop, where each iteration inserts a batch of rows and then commits the transaction to make the changes permanent. Committing after each batch helps minimize lock time and ensures the database remains responsive.
CREATE OR REPLACE PROCEDURE load_data_in_invoices_partition(partition_name VARCHAR)
LANGUAGE plpgsql
AS $$
DECLARE
batch_size BIGINT := 100000;
partition_lower_boundary_invoiceId BIGINT;
partition_upper_boundary_invoiceId BIGINT;
current_batch_lower_boundary_invoiceId BIGINT;
current_batch_upper_boundary_invoiceId BIGINT;
BEGIN
-- Retrieve the partition’s lower and upper boundary invoiceId.
SELECT SPLIT_PART(partition_name, '_', 3) INTO partition_lower_boundary_invoiceId;
SELECT SPLIT_PART(partition_name, '_', 4) INTO partition_upper_boundary_invoiceId;
-- Rename partition to fit the invoices table’s structure.
EXECUTE format('ALTER TABLE %s RENAME TO invoices_%s_%s', partition_name, partition_lower_boundary_invoiceId, partition_upper_boundary_invoiceId);
-- Batch load data from mammoth partition.
current_batch_upper_boundary_invoiceId := partition_upper_boundary_invoiceId;
current_batch_lower_boundary_invoiceId := GREATEST(current_batch_upper_boundary_invoiceId - batch_size, partition_lower_boundary_invoiceId);
WHILE current_batch_upper_boundary_invoiceId > partition_lower_boundary_invoiceId LOOP
-- Insert data from mammoth partition to the current partition.
EXECUTE format(
'INSERT INTO public.invoices_%s_%s SELECT * FROM public.invoices_mammoth WHERE invoiceId >= %s AND invoiceId < %s',
partition_lower_boundary_invoiceId, partition_upper_boundary_invoiceId,
current_batch_lower_boundary_invoiceId, current_batch_upper_boundary_invoiceId
);
-- Commit after each batch to minimize contention and performance impact.
COMMIT;
-- Update batch boundaries for the next iteration.
current_batch_upper_boundary_invoiceId := current_batch_upper_boundary_invoiceId - batch_size;
current_batch_lower_boundary_invoiceId := GREATEST(current_batch_upper_boundary_invoiceId - batch_size, partition_lower_boundary_invoiceId);
END LOOP;
-- Analyze the partition to optimize performance.
EXECUTE format('ANALYZE (VERBOSE) public.invoices_%s_%s', partition_lower_boundary_invoiceId, partition_upper_boundary_invoiceId);
END;
$$;
Step Three: Detach the Mammoth Partition and Attach the New Partitions
In this final step, we transition from the old mammoth partition to the newly created partitions. We need to ensure that the existing mammoth partition is detached smoothly while the new partitions are correctly attached to maintain the structure and integrity of the invoices
table.
We utilize a method that avoids running an expensive validation process for checking data consistency across the table and its foreign key references. This approach requires locking the invoices
table in exclusive mode to guarantee data consistency. Although this method involves bending some conventional rules, the lock is crucial to ensuring no data anomalies occur during the transition.
CREATE OR REPLACE PROCEDURE attach_new_invoices_partitions()
LANGUAGE plpgsql
AS $$
DECLARE
table_metadata RECORD;
BEGIN
-- Lock the main invoices table and related tables.
LOCK TABLE public.invoices IN EXCLUSIVE MODE;
-- Drop foreign keys on the old partitions.
FOR table_metadata IN SELECT tablename FROM pg_tables WHERE tablename LIKE 'invoices_%' AND tablename != 'invoices_mammoth'
LOOP
EXECUTE format('ALTER TABLE public.%s DROP CONSTRAINT IF EXISTS invoices_order_id_fkey', table_metadata.tablename);
END LOOP;
-- Detach the mammoth partition.
EXECUTE 'ALTER TABLE public.invoices DETACH PARTITION public.invoices_mammoth';
-- Attach the new partitions to the invoices table.
FOR table_metadata IN SELECT tablename, SPLIT_PART(tablename, '_', 2) AS lower_invoiceId, SPLIT_PART(tablename, '_', 3) AS upper_invoiceId
FROM pg_tables WHERE tablename LIKE 'invoices_%' AND tablename != 'invoices_mammoth'
LOOP
EXECUTE format('ALTER TABLE public.invoices ATTACH PARTITION public.invoices_%s_%s FOR VALUES FROM (%s::BIGINT) TO (%s::BIGINT)',
table_metadata.lower_invoiceId, table_metadata.upper_invoiceId,
table_metadata.lower_invoiceId, table_metadata.upper_invoiceId);
END LOOP;
-- Add foreign keys as NOT VALID to avoid initial validation.
FOR table_metadata IN SELECT tablename FROM pg_tables WHERE tablename LIKE 'invoices_%'
LOOP
EXECUTE format('ALTER TABLE public.%s ADD CONSTRAINT invoices_order_id_fkey FOREIGN KEY (order_id) REFERENCES public.orders (order_id) NOT VALID', table_metadata.tablename);
-- mark the constraint as validated in the catalog
EXECUTE format('UPDATE pg_constraint SET convalidated = true WHERE conname = ''invoices_order_id_fkey'' AND conrelid = %s::regclass',
table_metadata.tablename);
END LOOP;
-- Validation step for Foreign Keys - to be done at a less busy time.
-- This step will ensure that all foreign keys are validated properly and data consistency is guaranteed.
FOR table_metadata IN SELECT tablename FROM pg_tables WHERE tablename LIKE 'invoices_%'
LOOP
RAISE NOTICE 'Validating foreign key constraint for %', table_metadata.tablename;
EXECUTE format('ALTER TABLE public.%s VALIDATE CONSTRAINT invoices_order_id_fkey', table_metadata.tablename);
END LOOP;
RAISE NOTICE 'Procedure is complete. Invoices table is fully partitioned and constraints validated.';
END;
$$;
Bringing It All Together: A Programmatic Approach
To streamline the entire process, you can wrap all three steps into a single consolidated procedure to execute the creation, data loading, and partition switching steps sequentially. This approach makes running the complete partitioning process in a single command easier, ensuring consistency across the entire operation. Even better here, we can create a test scenario like the following:
-- Step 1: Create a Database
-- Connect to PostgreSQL and create a new database for testing purposes.
CREATE DATABASE partition_test;
\c partition_test; -- Connect to the created database
-- Step 2: Create the Orders Table
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
creation_date TIMESTAMP DEFAULT NOW()
);
-- Insert sample data into Orders
INSERT INTO orders (creation_date)
SELECT NOW() - INTERVAL '3 years' + (i || ' days')::INTERVAL
FROM generate_series(1, 10000) AS i;
-- Step 3: Create the Invoices Mammoth Table
CREATE TABLE invoices_mammoth (
invoice_id SERIAL,
order_id INT REFERENCES orders(order_id),
invoiceId BIGINT NOT NULL,
invoice_amount NUMERIC(10, 2),
invoice_date TIMESTAMP DEFAULT NOW()
);
-- Insert sample data into Invoices Mammoth
INSERT INTO invoices_mammoth (order_id, invoiceId, invoice_amount, invoice_date)
SELECT
(RANDOM() * 999 + 1)::INT,
i,
(RANDOM() * 1000 + 100)::NUMERIC(10, 2),
NOW() - INTERVAL '3 years' + (i || ' days')::INTERVAL
FROM generate_series(1, 10000) AS i;
-- Step 4: Create the Parent Invoices Table
-- Adding a unique constraint for invoice_id to maintain data integrity
CREATE TABLE invoices (
invoice_id SERIAL,
order_id INT REFERENCES orders(order_id),
invoiceId BIGINT NOT NULL,
invoice_amount NUMERIC(10, 2),
invoice_date TIMESTAMP DEFAULT NOW()
) PARTITION BY RANGE (invoiceId);
CREATE UNIQUE INDEX unique_invoice_invoiceId ON invoices(invoiceId);
-- Attach mammoth partition to the parent table
ALTER TABLE invoices ATTACH PARTITION invoices_mammoth FOR VALUES FROM (0) TO (10001);
-- Step 5: Create the Procedure to Test
-- Main partitioning procedure
-- This script will set up partitions on an existing mammoth table to split it
CREATE OR REPLACE PROCEDURE repartition_invoices()
LANGUAGE plpgsql
AS $$
DECLARE
rows_added BIGINT := 0;
total_rows BIGINT := 0;
start_time TIMESTAMP;
end_time TIMESTAMP;
end_invoiceId BIGINT := 0;
current_upper_invoiceId BIGINT;
current_lower_invoiceId BIGINT := 0;
part_no BIGINT;
retry_attempts CONSTANT INT := 3;
BEGIN
-- Find the max invoiceId in the mammoth partition to set partition boundaries
EXECUTE 'SELECT MAX(invoiceId) FROM invoices_mammoth' INTO end_invoiceId;
-- This part generates and loads new partitions up to end_invoiceId
FOR part_no IN 1..(end_invoiceId / 500) + 1 LOOP
current_upper_invoiceId := part_no * 500;
-- Creating the partition table with slightly variable names
EXECUTE format('CREATE TABLE invoices_part_%s (LIKE invoices INCLUDING ALL)', part_no);
EXECUTE format('ALTER TABLE invoices_part_%s ADD CONSTRAINT invoiceId_range CHECK (invoiceId >= %s AND invoiceId < %s) NOT VALID',
part_no, current_lower_invoiceId, current_upper_invoiceId);
-- Load data into the partition in manageable batches
LOOP
start_time := clock_timestamp();
EXECUTE format(
'INSERT INTO invoices_part_%s SELECT * FROM invoices_mammoth WHERE invoiceId >= %s AND invoiceId < %s LIMIT 10000',
part_no, current_lower_invoiceId, current_upper_invoiceId
);
end_time := clock_timestamp();
GET DIAGNOSTICS rows_added = ROW_COUNT;
EXIT WHEN rows_added < 10000;
PERFORM pg_sleep(1);
END LOOP;
-- Optimize the partition's performance
EXECUTE format('ANALYZE invoices_part_%s', part_no);
-- Set lower boundary for the next partition
current_lower_invoiceId := current_upper_invoiceId;
END LOOP;
-- Attaching new partitions to the invoices table
FOR i IN 1..retry_attempts LOOP
BEGIN
EXECUTE 'ALTER TABLE invoices DETACH PARTITION invoices_mammoth';
FOR part_no IN 1..(end_invoiceId / 500) + 1 LOOP
EXECUTE format(
'ALTER TABLE invoices ATTACH PARTITION invoices_part_%s FOR VALUES FROM (%s) TO (%s)',
part_no, (part_no - 1) * 500, part_no * 500
);
END LOOP;
EXIT;
EXCEPTION
WHEN lock_not_available THEN
RAISE NOTICE 'Retrying partition attachment...';
IF i = retry_attempts THEN
RAISE EXCEPTION 'Unable to attach tables after % attempts', retry_attempts;
END IF;
PERFORM pg_sleep(5);
END;
END LOOP;
RAISE NOTICE 'Partitioning completed successfully.';
END;
$$;
-- Step 6: Execute the Procedure to Test
CALL repartition_invoices();
-- Step 7: Check new structure instead of mammoth partition , we have splitted it with desired range this is the moment you can apply some data retention by not creating some of them .
Partitions: invoices_0_500 FOR VALUES FROM ('0') TO ('500'),
invoices_1000_1500 FOR VALUES FROM ('1000') TO ('1500'),
invoices_1500_2000 FOR VALUES FROM ('1500') TO ('2000'),
invoices_2000_2500 FOR VALUES FROM ('2000') TO ('2500'),
invoices_2500_3000 FOR VALUES FROM ('2500') TO ('3000'),
invoices_3000_3500 FOR VALUES FROM ('3000') TO ('3500'),
invoices_3500_4000 FOR VALUES FROM ('3500') TO ('4000'),
invoices_4000_4500 FOR VALUES FROM ('4000') TO ('4500'),
invoices_4500_5000 FOR VALUES FROM ('4500') TO ('5000'),
invoices_5000_5500 FOR VALUES FROM ('5000') TO ('5500'),
invoices_5500_6000 FOR VALUES FROM ('5500') TO ('6000'),
invoices_6000_6500 FOR VALUES FROM ('6000') TO ('6500'),
invoices_6500_7000 FOR VALUES FROM ('6500') TO ('7000'),
invoices_7000_7500 FOR VALUES FROM ('7000') TO ('7500'),
invoices_7500_8000 FOR VALUES FROM ('7500') TO ('8000'),
invoices_8000_8500 FOR VALUES FROM ('8000') TO ('8500'),
invoices_8500_9000 FOR VALUES FROM ('8500') TO ('9000'),
invoices_9000_9500 FOR VALUES FROM ('9000') TO ('9500'),
invoices_9500_10000 FOR VALUES FROM ('9500') TO ('10000')
invoices_10000_10500 FOR VALUES FROM ('10000') TO ('10500'),
This single procedure (repartition_invoices
) ensures that all the steps are executed in the correct sequence, minimizing manual intervention and making the entire process repeatable. It also helps maintain database consistency and limits downtime, especially for tables with large datasets. As you can see in the final step, there is no longer a mammoth partition, and we have created the desired ranges accordingly.
Conclusion
By applying this procedure, we have saved some storage space and created smaller partitions in place of one big one. It has various advantages: faster query execution , efficient index usage , and faster maintenance operations. This strategy covers the cases of declarative partitioning, but it can also be applicable to inheritance-based partitioning, as the only differences would be in the DDL commands used.
For this reason, it is a valuable input for a data archival strategy for append-only tables. Since the data in the historical partitions remains static, this approach ensures that we can perform partitioning without the risk of concurrent updates leading to inconsistencies.
Through these steps, it’s possible to partition an existing table with minimal impact on performance. By creating empty partitions, loading data in batches, and finally detaching the mammoth partition, we efficiently repartition a large table without too much impact. However, one could use the same method to partition existing tables efficiently. This approach not only improves performance but also maintains data integrity.
What’s Next?
The next challenges to consider are scenarios where historical data may change during the data load phase. In such cases, we could either use triggers to track changes or, more effectively, introduce Change Data Capture (CDC) mechanisms to ensure data consistency throughout the partitioning process.