By: Regis Wilson
This document was written as an investigation into Postgres partitioning features that will be available in version 10 and 11. TrueCar uses Postgres as its primary relational backend store for user data, car makes, model information, and more. When we first began our years-long migration to AWS, the current version AWS offered for RDS Postgres was version 9. During the migration, upgrading the database version was a lower priority and new features and performance improvements we might want to take advantage of were passing us by. Fortunately, we recently completed an upgrade to Postgres version 11 which we may describe in a separate post later.
This document was originally written as an internal technical document to investigate and plan upcoming features that we would like to use and take advantage of. We also needed to decide whether we would upgrade to version 10 first or go for features directly available in 11. We decided to publish this document to share our investigation and some of the preliminary conclusions we have reached. Hopefully the reader will find it as interesting and informative as we did in choosing a version to upgrade to and which features were most important to us.
As of v11, the following investigation and testing has been performed. The design goals of the investigation are to find out:
- when and how we should best implement partitioning
- what the strategy is for migrating from where we are today to the future of partitioning
What are the major features of partitioning?
This information comes from https://www.postgresql.org/docs/current/ddl-partitioning.html and https://pgdash.io/blog/partition-postgres-11.html both of which are good, short reads.
Summary of differences between 10 and 11
Keep in mind that all features of V10 are still available in V11, so if we prefer the v10 style of inheritance (because it’s more flexible, perhaps), then that is a viable alternative when upgrading to 11. We can also use both styles of partitioning in v11 if we decide, for example, that one table is better suited to inheritance and another table is more suited for declarative partitioning.
V10 introduced partitioning by inheritance, which offers more flexibility at the cost of “manual” distribution of rows to partitions.
V11 incorporated “automatic” partitioning of rows, including distribution and even updating (to new partitions!) of rows at the expense of some inflexibility (for example, you cannot change the parent of a declared partition table and declared partition tables cannot participate in inheritance with other normal inheritance tables).
V10 introduced partitioning by Range (for example, dates, or ids) and List (keys, or major bucket names)
V11 added partitioning by Hash (modulus or round-robin)
V10 uses inheritance to copy table column information which is “loose”, in that you can still manipulate the columns (for example, add columns) on child tables (“partitions”). This looseness has some drawbacks: you must manually copy certain column operations yourself (such as UNIQUE) to the child partitions.
V11 is much stricter in that you cannot manipulate the columns of partitions (and there are some constraints on CHECK and NOT NULL and ONLY parameters [read the docs])
V11 introduced partition-level aggregation, which is off by default. (Later versions turn this on by default.) Enabling this will allow parallel workers to descend into each applicable partition (with pruning) and then catenate results.
What are the major benefits of partitioning?
The generic benefits of partitioning are:
- Segmenting data by some useful bucket (usually date) can help with backups, retention, separating “hot” and “cold” data, querying, and reporting
- If there are any requirements for retention and pruning, then partitioning can make the process simpler by bucketing data to be handled together
- Partitioned tables can be queried and written individually by, say, batch processes or for backfill process
The benefits proposed by partitioning using V10 are:
- Improved query time if queries are in a single partition (queries on a 50GB table are always faster than a 500GB table, even with indices)
- Improved deletion time because partitions can be dropped without vacuum
- Partitions can be “detached”, which allows them to be handled individually for backup/testing/reporting.
The benefits proposed by partitioning using v11 are:
- All the benefits above
- Declarative partitioning makes most of the handling of partition writes, queries, and updates “automatically”
- Query plans to partitioned tables can use parallel querying across partitions instead of serially. This is only available with a flag set in v11. For example, if a partitioned table has three segments by month, and a query spans the three segments, then it is possible the plan will show three parallel queries issued across the three partitions.
What are the major drawbacks of partitioning?
The generic drawbacks to partitioning are:
- If partitioning is not useful then it could actually be harmful. As one example, if a table is small or static or has too many buckets so that each partition is small, then partitioning badly could cause serious performance problems.
- Partitioning could create constraints or operational burdens on the table that make using it difficult or intrusive to the code abstractions.
- Additional testing and code are usually required, since partitioned tables are not completely transparent to the application.
The drawbacks to partitioning in V10 are:
- V10 partitioning is a little better than “completely manual” partitioning. For example, V10 requires one to configure an inheritance table relationship and requires a trigger and other related overhead that are easy to make mistakes with or miss. Considerable operational overhead and complexity is added by using partitioning in V10 that makes it unattractive for use today.
- Table limits could drastically alter the way partitioning is implemented. The child limits are probably close to 1000 (see docs for exact numbers), but we should never even consider growing close to that number of partitions.
The drawbacks to partitioning in V11 are:
- V11 is new and there could be unintended side effects and bugs that have not been fully flushed out yet. It is not known how many large, enterprise, or critical workloads have been moved to V11 in production.
- V11 partitioning is much improved and more “automatic” than V10, but there are additional constraints, operational burdens, and so forth that are required to support this feature.
We should prefer to use V11 partitioning over V10 partitioning. In fact, I would discourage any use of V10 inheritance partitioning unless we find ourselves using inheritance tables for other reasons and we are already familiar with it.
When and how should we use partitioning?
Partitioning is ideally suited for tables that:
- Have rows that are rarely or never updated
- Grow indefinitely
- Need to be pruned
- Are “large” (>100GB)
- Can be segmented by a small (<100) number of buckets
- Need to be archived or rotated
Partitioning is not suitable for tables that:
- Perform a lot of in-place updates (this is a possible cause of rows moving between partitions)
- Grow slowly or not at all
- Are “small” (< 100GB)
- Have keys with high cardinality
- Need to be kept online forever
- Do not have any rotation, pruning, or archival needs
- Partition keys that are most suitable:
- Primary key IDs
- Dates by either day, week, month, or year
We should prefer to use V11 partitioning over V10 partitioning. In fact, I would discourage any use of V10 inheritance partitioning unless we find ourselves using inheritance tables for other reasons and we are already familiar with it. In general, it is better to reduce the number of tools, patterns, and exceptions we use, and so, if possible, we should settle on only one method of partitioning. Even if we decide to upgrade to V10 and cannot participate in the features offered in V11, we need to be explicit in recognizing the additional cost and burden of changing to V11 partitioning.
We should strongly prefer partitioning over pruning datasets where possible. Not only are there smaller query and parallel performance benefits of using V11 partitioning, but pruning has negative effects on table locking, index recreation, and vacuuming.
Partitioning does come with additional operational overhead; namely, that new future partitions need to be created and old partitions need to be detached/removed. If these procedures and processes are not in place, it is conceivable that exceptions and outages may occur when new data are written to tables or if the old partition tables stay active indefinitely.
Partitioning tables is strongly encouraged for any tables that have a date range associated as the main key. This includes event logs and transaction tables (arranged by date). This natural “append-only” type of table can be split and managed logically and easily and are good fits.
We should consider partitioning for any large table where we can set a reasonable number of partitions, even if such a partition strategy is not obvious. I’ll give an example: Let’s say we have a list of user ids with the username as primary key and integer “userid” and “updated at” column. Partitioning by userid range might help spread out load across userid keys (let’s say by 100,000s or 1,000,000s). This strategy would allow the query planner to guess ahead to which partition a userid would be stored. This way each partition would only contain 100,000 or 1,000,000 users each (rather than tens of millions, say). Another option is to partition based on “updated date” ranges. Let’s pretend we partition by year. Every user who logs in and is updated this year would automatically migrate to the latest partition. Older partitions would now hold inactive or dead users. After three years, or say five years (or more), old partitions could be archived (with fallback logic) or moved offline.
Particularly in light of the features planned to show [feature description removed], using partitioned tables by month seems like a good strategy. This provides the benefits of smaller monthly tables, parallel searches across tables, and a good retention policy. For example, if we wanted to prune a [some data] to three months, we could keep four or five partition tables open at one time: one table for the upcoming month, one table for the current month, and three tables of historical data (the last table could be taken offline, backed up, or kept only for reporting).
Partitioning for events
This is based on an early assessment for the events usage which could benefit from partitioning. The proposed partition scheme would have the following benefits:
- Partitioning allows for constraint-based query plans that help by avoiding index lookups (indices on partition splits are inherent in the partition strategy: not using an index is more efficient that having to use an index).
- Partitioning allows for operational ease in removing old data since an old partition can merely be dropped or dissociated rather than copied or deleted and vacuumed (although it does introduce more operational costs for creating new partitions and maintaining the correct partitions).
- Append-only, date-based rows are very efficient in partitioned tables and allow for reduced I/O and locking.
The current strategy is to partition by date, and, in this case, by month. Monthly tables would be large enough to make partitioning worthwhile and also reduce operational overhead. This example is adapted from a great article here. Please read it in full. Partitioning by month would yield the following example:
The calculation for the size of each month is given by some estimates:
And I believe we can do even better. A couple of example queries could offer subpartitioning ideas. Given the following two types of query examples:
You can see that we have an opportunity (perhaps to sub-partition by parts of the WHERE clause. Assuming (just as an example) that “recipient_type” is a fixed number of at most a half-dozen types, we could subdivide like this:
Now, each month will also have sub-partitions by values in the owner_type column. Assuming that owner_type is very low cardinality, query times could be significantly improved (but we should verify that owner_type is small and static).
We are hiring! If you love solving problems please reach out, we would love to have you join us!