Solutions using PostgreSQL

When you’re stuck solving a problem. Source: Portal, Pinterest

This article is not meant to explain internals of PostgreSQL, nor does it describe how to write simple select * from statements. We will mostly focus on solving problems that might come up on a day to day basis but in a better way (hopefully). This article is meant to be more of a reference that tells you about approaches you might not have known about.

Each solution discussed here will have at least some sort of explanation to the advantages and possible disadvantages of that specific approach (if they exist). Each section contains its own tables representing its own contained unit and existing independently from an entire database schema.

The Index below will show you the topics covered and will help you navigate towards your topic of choice:

Multi table joins

This is the schema we will be working with, and the task is simple:

Select all transactions that were approved showing both: user’s name and manager’s name. Not all transactions require an approval by a manager.

The biggest issue here is the manager name field. And there’s a very simple and elegant way to solve this:

SQL Fiddle

The only piece of code here that might seem unfamiliar is this one:

left join managers m
join users mu on mu.user_id = m.user_id
on m.manager_id = t.manager_id

You can imagine this as a sort of subjoin (like a subquery), and you’re probably wondering “what’s wrong with a normal left join subquery?”:

left join (
select m.manager_id, u.user_name
from managers m
join users u on u.user_id = m.user_id
) mu on mu.manager_id = t.manager_id

There’s absolutely nothing wrong with it! Feel free to test them out individually and you’ll see they both have the same exact query plan and are 1:1 equivalent. The only difference is that the first line of code is much more elegant and easier to follow in bigger queries. This should make your life a lot easier writing bigger queries and debugging in the future.

What are the downsides to this approach? None.

Data integrity with functions

One of the most important things in databases is data consistency and integrity. Also, one of the best practices is to handle data integrity in your code and not let your DBMS throw its own errors when shit hits the fan. It helps in debugging as well as giving a readable and understandable error messages.

This is useful to know when you need to insert data into multiple tables, like in the following simplified registration process:

Once a user registers, we need his settings to be initialized to the agreed business logic default values.

The following function would do this:

This doesn’t handle every case (like the groups field or other problems unique to each case) but, it should give you a good hint on how to solve similar issues in your environment.

This solution has one of the most important benefits, consistency of data, knowing what happened and why. It is also important to check that manually because of how null behaves, which I covered in my two-part series explaining null: part one, part two.

Possible downsides? Mostly small amount of maintenance. When implemented correctly, maintenance shouldn’t be an issue as in many cases you won’t even touch it after creating it.

Upsert

For the purpose of this example, we’ll use the user_types table from the previous example, but we will simplify it (so it’s not related to some sort of business logic) and just stick to the basics of upsert.

Upsert is nothing new, in fact, you’ve probably already written code that functions in a similar way. Upsert is nothing more than Update — Insert combined into one function.

The on conflict keyword is PostgreSQL’s implementation of upsert. I recommend reading the linked documentation. It only works with columns that have a unique constraint (otherwise it can’t detect conflicts).

In versions before 9.5 you had to do your own version of upsert manually, and in many cases, even today, you might have to do something similar depending on the business requirements and how many things you need to check before doing an insert/update.

This is a more “manual” approach, the if not found statement works only on the query directly before it. If that query returns nothing then whatever you were looking for is “not found”. It’s pretty easy and intuitive, and it’s a good tool to have under your belt when coding.

The main benefit of this approach (talking about the first on conflict approach) is less reads done to a table which is always important as reducing disk I/O can boost performance dramatically down the line. Unlike the first implementation, using not found does require table reads. Another benefit is less maintenance, less code and functions to check when things need to be changed.

Downsides? This shouldn’t be used everywhere, as sometimes the expected behavior is to insert but it still does update. This issue only applies to the second implementation using not found as unique values wouldn’t allow you to insert duplicates anyways.

Aggregates

I am sure we’ve all written at least one sum or count , but I have come across cases where each aggregate needed to have its own order and filters. In such cases PostgreSQL thankfully offers some extremely helpful tools.

For this specific example I will be using json data type as it is a commonly supported data type in most languages and frankly, it is the simplest way to explain this as PostgreSQL has built in support for jsonb (also known as bson).

SQL Fiddle

Let’s start with the subquery, the only new thing here is the filter keyword. The filter keyword allows you to assign filters to specific aggregates, and is as far as I know unique to PostgreSQL and SQLite, but you can emulate the same behavior in other DBMS using case when. And at the main select clause you will see there’s an order by inside of an aggregation function. You are allowed to individually order aggregates without worrying about the order of other parameters, this works with all aggregation functions.

Downsides to this approach? Close to none, this will perform better than a multi-subquery solution you might have used otherwise. But it might also be harder to understand for people inexperienced with SQL.

Hopefully this article has helped you or taught you something new, I am interested in what other techniques you guys use to write your code (like reducing disk I/O, using declared variables or anything else), so do stay a while and share your thoughts and experience.

Source: Hollow Knight, Artist

--

--