PostgreSQL 10 Built-in Partitioning

Since I have a passing interest in partitioning in PostgreSQL, I figured I’d check out a recent commit to the development branch of PostgreSQL 10

Implement table partitioning — https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63

Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.

Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
expression.

Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.

Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.

After many years of waiting, one of the major features missing from PostgreSQL is finally getting its first major step forward with the inclusion of a built in partitioning option. The syntax and usage is fairly straight forward so let’s jump straight into it with the examples from the documentation (slightly modified)

The basic syntax has two parts, one for initially creating a partitioned set

CREATE TABLE table_name ( ... )
[ PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [, ... ] ) ]

And another for adding a child table to that partition set

CREATE TABLE table_name
PARTITION OF parent_table [ (
{ column_name [ column_constraint [ ... ] ]
| table_constraint }
[, ... ]
) ] FOR VALUES partition_bound_spec

and partition_bound_spec is:

{ IN ( expression [, ...] ) |
FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) }

A ranged partition set is likely the most common use-case and is what pg_partman provides for time & id partitioning. Each child table is given a specific upper and lower bound of data. Once nice thing is that it can support multiple columns as well.

CREATE TABLE measurement (
logdate date not null,
peaktemp int default 1,
unitsales int
) PARTITION BY RANGE (logdate);

ALTER TABLE measurement ADD CHECK (peaktemp > 0);
CREATE TABLE measurement_y2016m07
PARTITION OF measurement (
unitsales DEFAULT 0
) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');

keith@keith=# \d+ measurement
Table "public.measurement"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------+---------+-----------+----------+---------+---------+--------------+-------------
logdate | date | | not null | | plain | |
peaktemp | integer | | | 1 | plain | |
unitsales | integer | | | | plain | |
Partition key: RANGE (logdate)
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)
Partitions: measurement_y2016m07 FOR VALUES FROM ('2016-07-01') TO ('2016-08-01')

keith@keith=# \d measurement_y2016m07
Table "public.measurement_y2016m07"
Column | Type | Collation | Nullable | Default
-----------+---------+-----------+----------+---------
logdate | date | | not null |
peaktemp | integer | | | 1
unitsales | integer | | | 0
Partition of: measurement FOR VALUES FROM ('2016-07-01') TO ('2016-08-01')
Check constraints:
"measurement_peaktemp_check" CHECK (peaktemp > 0)

You can see that unlike all partitioning methods up until now, there is no user-visible trigger involved. All the data routing is handled internally which greatly simplifies setting things up and opens up greater opportunities for optimization. Also, constraints and some other properties are automatically inherited from the parent. But since indexes cannot be defined on the parent, they have to be defined per child. Above you can see you can define different defaults per child as well as other properties.

The other partitioning type is LIST and this is for explicitly defining values to go into specific children.

CREATE TABLE cities (
city_id bigserial not null,
name text not null,
population int
) PARTITION BY LIST (initcap(name));

CREATE TABLE cities_west
PARTITION OF cities (
CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('Los Angeles', 'San Francisco');

keith@keith=# \d+ cities
Table "public.cities"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+---------+-----------+----------+-----------------------------------------+----------+--------------+-------------
city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | |
name | text | | not null | | extended | |
population | integer | | | | plain | |
Partition key: LIST (initcap(name))
Partitions: cities_west FOR VALUES IN ('Los Angeles', 'San Francisco')

keith@keith=# \d+ cities_west
Table "public.cities_west"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+---------+-----------+----------+-----------------------------------------+----------+--------------+-------------
city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | |
name | text | | not null | | extended | |
population | integer | | | | plain | |
Partition of: cities FOR VALUES IN ('Los Angeles', 'San Francisco')
Check constraints:
"city_id_nonzero" CHECK (city_id <> 0)

Here you can see the default of the sequence is inherited and an example of defining a constraint on the child. So what happens if you try to insert data and a child table is not defined for it?

keith@keith=# insert into cities (name, population) values ('Baltimore', 2003);
2016-12-09 17:36:57.999 EST [15839] ERROR: no partition of relation "cities" found for row
2016-12-09 17:36:57.999 EST [15839] DETAIL: Failing row contains (1, Baltimore, 2003).
2016-12-09 17:36:57.999 EST [15839] STATEMENT: insert into cities (name, population) values ('Baltimore', 2003);
ERROR: no partition of relation "cities" found for row
DETAIL: Failing row contains (1, Baltimore, 2003).
Time: 0.641 ms
keith@keith=# insert into cities (name, population) values ('Los Angeles', 2003);
INSERT 0 1
Time: 3.127 ms
keith@keith=# select * from cities;
city_id | name | population
---------+-------------+------------
2 | Los Angeles | 2003
(1 row)

Time: 0.525 ms

Invalid data is currently rejected. It also seems to have incremented the sequence as well (I re-ran this again to be sure). Currently I’m not sure if there’s a way around the data rejection, and defining a trigger with a exception handler is not a good idea. As I found out the hard way in pg_partman, just the existence of an exception block in a trigger will also increase xid consumption since every row inserted will increment the global xid value. If this table has a high insert rate, you can quickly reach xid wraparound. I made a suggestion to the core devs to define a child table as the default and have any data that doesn’t match a child table go there instead.

Sub-partitioning is also possible and you can easily mix and match both range and list partitioning types. Here we redefine the cities_west table to be partitioned as well

CREATE TABLE cities_west
PARTITION OF cities (
CONSTRAINT city_id_nonzero CHECK (city_id != 0)
) FOR VALUES IN ('Los Angeles', 'San Francisco') PARTITION BY RANGE (population);

CREATE TABLE cities_west_10000_to_100000
PARTITION OF cities_west FOR VALUES FROM (10000) TO (100000);

keith@keith=# \d+ cities
Table "public.cities"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+---------+-----------+----------+-----------------------------------------+----------+--------------+-------------
city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | |
name | text | | not null | | extended | |
population | integer | | | | plain | |
Partition key: LIST (initcap(name))
Partitions: cities_west FOR VALUES IN ('Los Angeles', 'San Francisco')

keith@keith=# \d+ cities_west
Table "public.cities_west"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+---------+-----------+----------+-----------------------------------------+----------+--------------+-------------
city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | |
name | text | | not null | | extended | |
population | integer | | not null | | plain | |
Partition of: cities FOR VALUES IN ('Los Angeles', 'San Francisco')
Partition key: RANGE (population)
Check constraints:
"city_id_nonzero" CHECK (city_id <> 0)
Partitions: cities_west_10000_to_100000 FOR VALUES FROM (10000) TO (100000)

keith@keith=# \d+ cities_west_10000_to_100000
Table "public.cities_west_10000_to_100000"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+---------+-----------+----------+-----------------------------------------+----------+--------------+-------------
city_id | bigint | | not null | nextval('cities_city_id_seq'::regclass) | plain | |
name | text | | not null | | extended | |
population | integer | | not null | | plain | |
Partition of: cities_west FOR VALUES FROM (10000) TO (100000)
Check constraints:
"city_id_nonzero" CHECK (city_id <> 0)

Global indexes do not yet exist in PostgreSQL, so the issue of primary/unique keys being enforced across a partition set is still an issue. And while you can define foreign keys on individual children to other tables, defining them back to a partitioned set will not work. Also, foreign keys cannot be defined on the parent table with this new feature, so they’ll have to be set per child. Thankfully they just return an error if you try any of these things instead of allowing them and causing confusion the way most current partition setups do.

keith@keith=# CREATE TABLE fk_test (unitsales integer);
CREATE TABLE
Time: 3.818 ms
keith@keith=# CREATE TABLE measurement (
keith(# logdate date not null,
keith(# peaktemp int default 1,
keith(# unitsales int REFERENCES fk_test (unit_sales)
keith(# ) PARTITION BY RANGE (logdate);
2016-12-12 11:34:14.485 EST [5051] ERROR: foreign key constraints are not supported on partitioned tables at character 122
2016-12-12 11:34:14.485 EST [5051] STATEMENT: CREATE TABLE measurement (
logdate date not null,
peaktemp int default 1,
unitsales int REFERENCES fk_test (unit_sales)
) PARTITION BY RANGE (logdate);
ERROR: foreign key constraints are not supported on partitioned tables
LINE 4: unitsales int REFERENCES fk_test (unit_sales)
^


keith@keith=# CREATE TABLE fk_test (unitsales integer REFERENCES measurement (unitsales));
2016-12-12 11:06:51.741 EST [5051] ERROR: cannot reference partitioned table "measurement"
2016-12-12 11:06:51.741 EST [5051] STATEMENT: CREATE TABLE fk_test (unitsales integer REFERENCES measurement (unitsales));
ERROR: cannot reference partitioned table "measurement"
Time: 1.794 ms

To get rid of a partitioned set, you’d use the CASCADE option just like with an inheritance set

keith@keith=# drop table cities cascade;
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to table cities_west
drop cascades to table cities_west_10000_to_100000
DROP TABLE

Currently the TRUNCATE command run on the parent table will truncate the entire partition set, but I did see discussion on the mailing lists of a WHERE clause to only target specific children. That would be a great feature, but even better would be a WHERE clause to the DROP TABLE command to make retention maintenance to drop old tables much easier.

You may have noticed there’s no mention of ongoing creation of new partitions when needed. This is where I see pg_partman still being useful at this time. I’ve only just barely started playing with this feature to see how I can incorporate it. My main method of determining what characteristics were inherited to children (indexes, constraints, defaults, FKs etc) was done by defining them on the parent, but the current limitations on what can be defined on the parent with this new feature make this trickier to handle. Whether I’ll keep my old methods and just add this on as a new feature or do another major code revision will have to wait to be seen. It will all depend if I can find a way to implement all existing features I’ve added to pg_partman since I’d rather not take any steps backwards if I don’t have to. Reading the relevant threads on the mailing lists, it seems they have a lot more features for partitioning to be added in, so I may wait a bit before diving into incorporating this into pg_partman too tightly since I may over complicate things on my part due to a current lack of features. This email from Robert Haas gives me hope this this will be a well maintained core feature — https://www.postgresql.org/message-id/CA%2BTgmobTxn2%2B0x96h5Le%2BGOK5kw3J37SRveNfzEdx9s5-Yd8vA%40mail.gmail.com

From the moment I started writing pg_partman, I knew built-in partitioning wasn’t too far around the corner. But we build the tools that we need at that time. I’m looking forward to seeing how this feature develops and I hope I can help out where possible to overcome any limitations encountered. Many thanks to Amit Langote and the review team!


Originally published at Keith’s Ramblings….