PostgreSQL — Tricks or Treats! (Part 1)

In the spirit (no pun intended) of Halloween, today I’ll present you a few tricks (not a pun either) that I discovered and utilized for PostgreSQL during the development of Happ.

Count different things with FILTER

In Happ, users can upvote or downvote a post. Naturally, we’d create a table vote with the following columns: id, type (an enum accepting two values “up” and “down”) and a foreign key post_id that references a post’s id.

To count the number of upvotes/downvotes for a post with id = 42, we can simply do:

SELECT COUNT(v.id) AS up_votes -- or down_votes
FROM vote v
WHERE v.post_id = 42 AND v.type = 'up'; -- or 'down'

However, we can merge the two queries together with CASE:

SELECT COUNT(CASE WHEN v.type = 'up' THEN 1 END) AS up_votes, 
COUNT(CASE WHEN v.type = 'down' THEN 1 END) AS down_votes
FROM vote v
WHERE v.post_id = 42;

For rows not matching the type condition, the CASE expression returns NULL and COUNT only counts NOT NULL values.

In PostgreSQL, we can write something similar to the first example and therefore more intuitive with the FILTER clause:

SELECT COUNT(v.id) FILTER (WHERE v.type = 'up') AS up_votes, 
COUNT(v.id) FILTER (WHERE v.type = 'down') AS down_votes
FROM vote v
WHERE v.post_id = 42;

Easy UPSERT for PostgreSQL 9.5 and above

Happ also allows users to log in with their Facebook accounts. When someone logs in for the second time, his/her Facebook info (e.g. name and email) should be updated accordingly in the database.

In most flavors of SQL, we would have to check the user Facebook ID’s existence in the database before deciding whether to insert or update his/her profile:

IF NOT EXISTS (SELECT * FROM users u WHERE fb_id = 123456)

INSERT INTO users(fb_id, name, email)
VALUES(123456, 'John Appleseed', 'john.appleseed@apple.com');

ELSE

UPDATE users
SET name = 'John Appleseed', email = 'john.appleseed@apple.com'
WHERE fb_id = 123456;

What if we want to insert/update multiple users at once? We’d need to use auxiliary views and queries:

-- creates an auxiliary table of users being updated/inserted
WITH data(fb_id, name, email) AS (VALUES
(123456, 'John Appleseed', 'john.appleseed@google.com'),
(246802, 'Lauren Cyril', 'lauren.cyril@google.com'),
(357913, 'Ivan Smith', 'ivan.smith@google.com')
),
-- tries to update users whose fb_ids exist in the database
updated_users AS (
UPDATE users u
SET (name, email) = (data.name, data.email) FROM data
WHERE u.fb_id = data.fb_id
RETURNING u.*
)
-- now inserts the remaining (new) users
-- i.e. rows in
data that don't exist in updated_users
INSERT
INTO users(fb_id, name, email)
SELECT (fb_id, name, email) FROM data
WHERE NOT EXISTS (
SELECT * FROM updated_users
WHERE updated_users.fb_id = data.fb_id
);

The lengthy query above is called an UPSERT, which aptly means UPdate existing rows before inSERTing new ones.

Fortunately, PostgreSQL 9.5+ has native support for such queries, in the form of ON CONFLICT statements:

INSERT INTO users(fb_id, name, email)
VALUES (123456, 'John Appleseed', 'john.appleseed@google.com'),
(246802, 'Lauren Cyril', 'lauren.cyril@google.com'),
(357913, 'Ivan Smith', 'ivan.smith@google.com')
ON CONFLICT(fb_id) DO 
UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email;

The first half of the query is just an ordinary INSERT statement. The second half is more interesting: it checks whether the unique constraint on column fb_id is violated, in which the corresponding row will be updated with the violating data (accessible via the EXCLUDED keyword).

Automatic “updatedAt” column

Many modern ORMs, including Mongoose, are able to add created_at and updated_at date attributes that get auto-assigned to the most recent create/update timestamp.

created_at is easy — just create a column with the default value now().

The updated_at column is more tricky. You’d have to rely on the ORM’s functionality to update the column whenever save() is called, for instance. This means if you queried the database manually, the column will not be updated.

Nevertheless, it’s possible to let PostgreSQL automate some of the tasks which used to reside in the application’s logic, in this case updating some timestamp column when a row is edited.

First, let’s define a PostgreSQL trigger function:

CREATE OR REPLACE FUNCTION update_modified_column()	
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$ language 'plpgsql';

It’s quite easy to see the function assigns the correct value to updated_at and returns the row.

Then we only need to attach this function to tables that need an automatic updated_at column, i.e. calling update_modified_column() whenever a row is updated:

CREATE TRIGGER update_post_modtime 
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE PROCEDURE update_modified_column();

(don’t use AFTER UPDATE if you want to avoid infinite loops!)

This technique is very powerful; you can order PostgreSQL to do much more complicated tasks automagically, including sending notification events (!).


As this is getting quite long, I’ll continue covering more tips in the next blog post.