Table Partitioning in PostgreSQL

If you’re familiar with Postgres, you’re probably already aware it has a fantastic table inheritance feature, whereby child tables can derive columns from parent tables. What you may not know is this is also the primary feature that enables table partitioning. Unfortunately, the official docs provide a fairly simplistic example of how to go about implementing it.

This post provides a real world partitioning example, and some of the gotcha’s you may encounter.

Problem Statement

Consider a hypothetical comments table:

CREATE TABLE “comments” (
id uuid DEFAULT uuid_generate_v4() NOT NULL,
user_id uuid NOT NULL,
comment text,
created_at timestamp with time zone DEFAULT now() NOT NULL,
modified_at timestamp with time zone DEFAULT now() NOT NULL
);

The table is expected to grow at a rate of a few million new rows per month. Fairly soon, performance will suffer. You observe the queries run by the application utilize the created_at column, and decide to use the month and year for partitioning the table.

Partitioning Strategy

Upon INSERT, new comments will be routed to a comments_<YEAR>_<MONTH> table, where YEAR and MONTH are based on comment.created_at.

For example, take the following INSERT:

INSERT INTO comments 
(user_id, comment, modified_at, created_at)
VALUES
(‘fe4742c7–909e-4df2–84fc-03d40372f224’, ‘Hello World’, ‘2016–07–01 18:50:59–00’, ‘2016–07–01 18:50:59–00’);

Based on its created_at date, this row will be inserted into the comments_2016_07 table. Comments inserted for the next month would be routed to comments_2016_08, and so forth. Barring a huge surge in the app’s popularity, this should keep each table’s row count in a comfortable range.

Goals

With our strategy in hand, we can set some objectives:

  • From the application code’s standpoint, there should be no changes necessary; partitioning should be transparent.
  • Minimal administration necessary; when a new INSERT is routed to a table that doesn’t exist, the table should be created automatically.
  • After UPDATING, INSERTING, or DELETING, the RETURNING clause should continue to work as per usual (again, see goal #1).
  • Upon creation, child tables should be given proper indexes.

Implementation

First, an overview of the steps:

  1. Create a table called comments_base. All child tables will inherit from this base table.
  2. Create a view called comments. The application will perform all CRUD operations using this view. The application should ignore the existence of comments_base and all child tables.
  3. Create a function which routes inserts to comments to the appropriate child table.

Creating the base table

CREATE TABLE comments_base (
id uuid DEFAULT uuid_generate_v4() NOT NULL,
user_id uuid NOT NULL,
comment text,
created_at timestamp with time zone DEFAULT now() NOT NULL,
modified_at timestamp with time zone DEFAULT now() NOT NULL
);

Think of the base table as providing the column template from which all child tables will be derived. Excepting indexes, a child table’s definition comes exclusively from the base table.

Create the view

CREATE VIEW comments AS SELECT * FROM comments_base;

This view replaces the original comments table. You might choose to rename the original comments table to comments_base, or dump the data for insert later and simply drop.

As mentioned above, the application will perform all SELECTS, INSERTS, UPDATES and DELETEs on the view.

Create a routing function

Validation

Now, when we try our previous INSERT, a table is automatically created to house the new record:

Try a few EXPLAIN queries and observe how the CHECK constraints allow the query optimizer to eliminate scanning certain child tables.

Selecting all rows(includes every child table):

vs with a WHERE clause on created_at (eliminates tables 07-09):

Gotchas, oops, etc.

Changing created_at

Let’s say you wanted to UPDATE a record’s created_at for some reason. You’ll receive a check constraint error, because the record will no longer meet the constraint condition:

In this situation all is not lost; you could certainly DELETE and re-INSERT(if you do, use a transaction!). However, be aware that if your application is going to be performing queries like the one above, your partitioning strategy is going to make things complicated…consider re-evaluating your partitioning strategy.

INSERTs with nowhere to go

Pay close attention to lines 32 & 33 in the gist above — notice the inclusive condition on start_dt(>=), and the exclusive condition on end_dt(<). Suppose you chose an exclusive condition for both — INSERTs with created_at’s falling between those values would fail, since no table would accept them.

Additional Reading

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.