Andrew Tirto Kusumo
julostories
Published in
4 min readMar 18, 2020

--

Postgres Partitioned Table Saved Us From Sleepless Nights

Yes, we can finally sleep….

This describes our journey with PostgreSQL at JULO, and how thankful we are to have migrated to PostgreSQL 10. It has saved us from many sleepless nights. So without further ado, let me share about my sleepless nights!

The Problem

As one the most used RDBMS out there, PostgreSQL is very popular. PostgreSQL announced their support in Native Partitioning since version 10. We liked this functionality but held off on migration for a long time in favor of stability. Last year when version 10 became more stable and mature we finally decided to give to move from 9.5 to 10.6 because of the many benefits it offered.

The story began in 2018 when we had a scheduler to delete data in a table that is already more than 1 year old. The scheduler executed a DELETE statement on a log table where age > 1 year old. Nothing was wrong with this, until our customer base started increasing from 500 a day to around 4000 a day. BOOM! the size of table and the number of rows is now 8 times bigger. The DELETE statement which initially ran within 20–30 minutes now exploded and took at least 4 hours to execute. Not to mention we had to run VACUUM to reclaim the size of disk that was left by dead rows in PostgreSQL.

The downside of doing a DELETE statement here is that it can also partially lock the transaction that happened at that time. So we started searching for a better way to solve this problem, because we needed to keep a close eye whenever the scheduler was about to run. Of course we set the scheduler to execute in the early hours of dawn where the number of incoming users is low: AKA sleepless.

Inspiration

Inspiration always comes suddenly, so after we moved to PostgreSQL 10 we started to do research on how to implement this. The basic concept is that you will have a parent table and children tables that will inherit from the parent. The children tables will each store values only with the condition that we have declared.

Say for example that you want to create a sales table like this:

CREATE TABLE sales (
date date not null,
city_id int,
unitsales int
);

Now let’s say you want to group the sales per month, because you want to query it faster per month and you want to delete data more than 6 months. All you need to do is to just add a little bit of declaration.

CREATE TABLE sales (
date date not null,
city_id int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE sales_y2020m03 PARTITION OF sales
FOR VALUES FROM ('2020-03-01') TO ('2020-04-01');
CREATE TABLE sales_y2020m04 PARTITION OF sales
FOR VALUES FROM ('2020-04-01') TO ('2020-05-01');
and so on...

After the table gets populated you can now see that the data is separated by month and you can create a scheduler to automatically DROP the table that is more than 6 months old. When you DROP the table you automatically reclaim the size on the disk without having to VACUUM it.

We learned the following three lessons:

  1. Sadly we couldn’t implement this solution through ORM managed migration. We needed to write the SQL statement manually. In JULO we use Django as the backend server and the Django migration can’t handle this yet, unless you use additional library such as django-postgres-extra.
  2. You need to select what kind of partition key is best for your case. PostgreSQL has 3 kind of partitions: RANGE, LIST, and HASH.
    - RANGE is for a range of value from x to y. (e.g. DATE BETWEEN 2020–02–01 AND 2020–02–29)
    - LIST is for a group of values that is already known. (e.g. country ID like ‘ID’, ‘JP’)
    - HASH is when you want to hash function as the partition key.
    If you want to read more about this, here is a good article.
  3. You NEED to create your own INDEX. PostgreSQL won’t automatically create it for you, you need to write your own INDEX statement and specify the INDEX key.

With these lessons learned, we made a script to automatically generate the SQL statement, since you need to create it manually. And it’s not nice if you need to create 100 child tables from the parent.

So here’s a script that maybe can help you to automatically create these tedious tasks for you. The script is only generating the RANGE partition, because this is what we use. You can index as many columns as you want in the table,although I don’t really recommend that. You can just change whatever you need to change in the script. The output is a .sql file that you can execute into the PostgreSQL.

Results

The result has been very good for us.

  • We can now sleep better.
  • We can handle more customers coming in. Bring it on! Huh!
  • We cut the process from executing a DELETE statement from more than 4 hours to only a DROP statement which takes around 30 seconds.
  • We don’t need to maintain any VACUUM after deleting.
  • Everybody is happy

--

--