I’ve had this situation crop up a few times with clients and after a discussion on #postgresql on Freenode recently, decided a blog post may be in order. The pitfalls that lead me to this solution are useful to cover and it seems a useful set of steps to have documented and be able to share again later.
There comes a time for most people when you have a table that builds up quite a lot of rows and you then realize you didn’t actually need to keep all of it. But you can’t just run a TRUNCATE because you do want to keep some of the more recent data. If it’s just a few million small rows, it’s not a huge deal to just run a simple DELETE. But when that starts getting into the billions of rows, or your rows are very large (long text, bytea, etc), a simple DELETE may not be realistic.
The first, and arguably easiest way, to deal with this would be to run the DELETE in batches instead of one large transaction. This allows you to add a pause in between the batches to help control I/O. Another side affect of such large delete operations can be an excessively high amount of WAL generation. This not only contributes to I/O, this can also dramatically increase disk space usage. When you’re trying to delete data due to disk space constraints, this can end up making things even worse before they get better. And if you’re deleting quite a lot of data from over a long period of time that didn’t receive many deletes before, you likely won’t get much disk space back at the end without doing a VACUUM FULL or pg_repack (see my discussion on bloat here). And the biggest issue of all when disk space is critical is either one of those options requires that you have at least as much disk space available as a full copy of the unbloated table would take up. So if disk space is the crucial problem most people are trying to solve with removing old data, how can we easily do this?
The first option that’s most easily done on almost any RDBMS is to make a secondary table and have new data copied/redirected there. The I/O and WAL generation of moving the smaller, required amount of data is much less than removing all the old data. Then you can just drop the old table. And most people do this via a trigger-based method: have every write to the old table also write to the new one as well. Then once you’re sure things are working, you can take a brief lock on both the old and new tables and swap their names. PostgreSQL makes this last step much easier, having transactional DDL. I’ll cover the commands to do this later since my final solution is similar to this. The main problems surrounding this come when the table you’re trying to clean up is a very high-traffic table. Doing a trigger like this basically doubles all writes which could possibly cause some I/O issues. There’s also the fact of making sure you get that trigger function code write, otherwise all writes break. Can be a little nerve wracking on critical production systems. But there is a way to avoid both the additional I/O of double writes and the headache of writing triggers.
The actual solution we came up for this involves using the often overlooked feature of table inheritance in PostgreSQL. The steps basically go like this:
- Create a new table exactly like the first one
- Set ownership/privileges on the new table to match the old table
- Have the old table INHERIT the new table
- Swap old and new table names
- Move the data you want to keep from the old table to the new one
- Drop the old table
As soon as you do step 4, all new data is immediately going to the new table. Also, since the old table is a child of the new table, all your old data is still visible from the original table name. Moving the data from the old table to the new one is nearly transparent to any users of the table, the only issue being there may be some slightly slower queries during that transition period since the planner has to account for 2 tables. You can help mitigate this slightly by placing a constraint on the new table (before the swap when it’s empty) that only allows data in the new table’s window. This allows constraint exclusions to possibly ignore the old table while you get data moved. Placing a constraint on the old table probably wouldn’t help much since it would have to lock it for validation. Yes there will be additional I/O and disk usage from WAL while you move data from the old table to the new, but this can be controlled to a much greater degree since all new data isn’t also being written twice by a trigger. You can move the data in batches with any necessary pauses to control those I/O & WAL spikes.
The one thing this method does not account for is if there are updates to data in the old table that would technically place it in the new one while you’re in the process of moving the recent data you want to keep. That update will not be moved from the old table to do the new one. But in most cases where we’ve done this, those updates weren’t that critical since, if you’d done the original method of just deleting the old data, you would’ve been deleting that data that was getting updated anyway. But it’s something to be aware of if you go querying the old table and still see “new” data after you think you’re done migrating. It’s also something to be aware of if your application thinks it updated something when the old table was there and now it’s suddenly gone. You may need to stick with the trigger method above if that’s the case then.
An example of commands to do this is below. The LIKE clause to the CREATE TABLE statement is quite useful since this can automatically include all indexes, constraints, defaults, comments, and storage options. It does NOT include ownership or privileges however, so the one critical step in this piece is definitely #2 above. You can easily see a table’s privileges with the \dp option in psql. Also, explicitly obtaining the exclusive lock on both tables before doing the name switch ensures nothing weird happens during whatever brief moment could exist between the switch.
CREATE TABLE public.notifications_new (LIKE public.notifications INCLUDING ALL);
ALTER TABLE public.notifications_new OWNER TO sysadmin;
GRANT select ON public.notifications_new TO read_only
GRANT select, insert, update, delete ON public.notifications TO app_user;
GRANT all ON public.notifications TO admin;
ALTER TABLE public.notifications INHERIT public.notifications_new;
LOCK TABLE public.notifications IN ACCESS EXCLUSIVE MODE;
LOCK TABLE public.notifications_new IN ACCESS EXCLUSIVE MODE;
ALTER TABLE public.notifications RENAME TO notifications_old;
ALTER TABLE public.notifications_new RENAME TO notifications;
COMMIT; (or ROLLBACK; if there's a problem)
Once all these steps are done, you can then begin the process of moving your more recent data out of the old table and into the new via whichever method works best for you. One easy method to batch this is a CTE query that does the DELETE/INSERT with a SELECT in a single query to limit the rows moved.
WITH row_batch AS (
SELECT id FROM public.notifications_old WHERE updated_at >= '2016-10-18 00:00:00'::timestamp LIMIT 20000 ),
delete_rows AS (
DELETE FROM public.notifications_old o USING row_batch b WHERE b.id = o.id RETURNING o.id, account_id, created_at, updated_at, resource_id, notifier_id, notifier_type)
INSERT INTO public.notifications SELECT * FROM delete_rows;
And once that’s done, you can then DROP the old table, instantly recovering all that disk space with minimal WAL traffic and zero bloat aftermath!
Originally published at Keith’s Ramblings….