Pagination With PostgreSQL
In real world, we have large amount of data that can not be retrieved or displayed in single go. Such data has to be retrieved in limited amount in form of pages. If you go through the web for different techniques for pagination at server side, you may have a general idea about methodology. It is always not best to use library, as it may using inefficient technique under the hood. In this story, I will be summarising different techniques that I find quite interesting and better case coverage than the rest along with technical implementation.
Let’s pin down the problems we will be looking into
- Sort by unique column (~3000 rows)
- Sort by unique column (~300000 rows)
- Sort by multiple columns (combination of all columns unique) (~300000 rows)
- Sort by multiple columns (combination of all columns unique) with nulls allowed (~300000 rows)
Sort by unique column (~3000 rows)
One can consider this case as most ideal case for pagination. You have to sort via some column that has unique & not null constraints with manageable total data. The easiest method of pagination, limit-offset, is also most perilous. Sadly it’s a staple of web application development tutorials. Let’s assume we want 20 rows per page. Then offset can simulate the effect of page number and limit can be use to constrict the number of records. In order to maintain consistency in results, one can use order by clause. Consider the following table namely employee
CREATE TABLE employee (
id BIGSERIAL PRIMARY KEY,
vehicle_number TEXT,
dept_id BIGINT,
type TEXT NOT NULL
);
Query to retrieve second page using limit-offset on employee
column id
would look like
SELECT vehicle_number, dept_id, type
FROM employee
ORDER BY employee.id DESC
LIMIT 20
OFFSET 20;
Downsides
The technique has two big problems, result inconsistency and offset inefficiency.
Result inconsistency : suppose a user moves from page n to n+1 while simultaneously a new element is inserted into page n. This will cause both a duplication (the previously-final element of page n is pushed into page n+1) and an omission (the new element). Alternatively consider an element removed from page n just as the user moves to page n+1. The previously initial element of page n+1 will be shifted to page n and be omitted.
Offset inefficiency : Large offsets are intrinsically expensive. Even in the presence of an index the database must scan through storage, counting rows. To utilise an index we would have to filter a column by a value, but in this case we require a certain number of rows irrespective of their column values. Furthermore the rows needn’t have the same size in storage, and some may be present on disk but marked as deleted so the database cannot use simple arithmetic to find a location on disk to begin reading results.
Sort by unique column (~300000 rows)
The both downsides of limit-offset can be solved by utilising an index to filter a row by its column value. In our case, we will use employee.id
index to filter rows. We will need some control variable to decide the page data like direction of movement (first, next or previous) and last column value.
Let’s assume we are ordering the data in descending order of employee.id
. Now the logic combining our control variable would look like
if (direction == 'first') {
sortingOrder = 'DESC'
} else if (direction == 'next') {
sortingOrder = 'DESC';
} else if (direction == 'previous') {
sortingOrder = 'ASC';
}
And final query would look like
query = '
SELECT vehicle_number, dept_id, type
FROM employee
ORDER BY employee.id DESC
'wherePart = ''if (sortingOrder == 'ASC')
wherePart += 'WHERE employee.id > ${last_column_value}'
else
wherePart += 'WHERE employee.id < ${last_column_value}'
Since, we are utilising DB indexes to retrieve data. This method is fast & can handle large datasets.
Sort by multiple columns (~300000 rows)
Earlier, we found a method of pagination that works fine for large datasets and result consistent. But, it is limited to cases when you have to paginate using single column’s value with unique contraint. A case may arise when we don’t have such single column with unique value. In such cases, we may have multiple columns whose combined values can act as unique key.
Let’s start with extending previous approach on combination of two columns. Assuming we have index on employee.vehicle_number & employee.dept_id
that we can leverage to upgrade our paginator. Since, we are using two columns so we have last_vehicle_number, last_dept_id, direction of movement as control variables.
Our WHERE
condition with two columns would look like
orderByPart = '
ORDER BY
employee.vehicle_number DESC,
employee.dept_id DESC
'if (sortingOrder == 'ASC')
wherePart += '
WHERE employee.vehicle_number > ${last_vehicle_number}
OR (
employee.vehicle_number = ${last_vehicle_number} AND
employee.dept_id > ${last_dept_id}
)
'orderByPart = '
ORDER BY
employee.vehicle_number ASC,
employee.dept_id ASC
'
else
wherePart += '
WHERE employee.vehicle_number < ${last_vehicle_number}
OR (
employee.vehicle_number = ${last_vehicle_number} AND
employee.dept_id < ${last_dept_id}
)
'
orderByPart = '
ORDER BY
employee.vehicle_number DESC,
employee.dept_id DESC
'
Now, if we have to extend above query to incorporate 3 columns, that would increase the complexity and decrease the readability of query. Luckily, we can use row constructor provided by PostgreSQL. Using row constructor, the query can be re-written as
if (sortingOrder == 'ASC')
wherePart += '
WHERE (employee.vehicle_number, employee.dept_id) >
(${last_vehicle_number}, ${last_dept_id})
'
orderByPart = '
ORDER BY
employee.vehicle_number ASC,
employee.dept_id ASC
'else
wherePart += '
WHERE (employee.vehicle_number, employee.dept_id) <
(${last_vehicle_number}, ${last_dept_id})
' orderByPart = '
ORDER BY
employee.vehicle_number DESC,
employee.dept_id DESC
'
Well we have created an exotic paginator for PostgreSQL backend. But, it does lack one final thing i.e. handling null values. It may be the case when the column we are using may contain null values.
Sort by multiple columns with nulls allowed (~300000 rows)
Extending previous version of our exotic paginator to handle null values can be achieved via a little trick. We will be using COALESCE, NULLS LAST
provided by PostgreSQL to handle the null values that may reside in our DB.
We will be sorting in such way that nulls are present either at start or end and using COALESCE LIMIT
accordingly. Here an example
if (sortingOrder == 'ASC')
wherePart += '
WHERE (COALESCE(employee.vehicle_number, '!'),
COALESCE(employee.dept_id, 0)) >
(COALESCE(${last_vehicle_number}, '!'),
COALESCE(${last_dept_id}, 0))
'
orderByPart = '
ORDER BY
employee.vehicle_number ASC NULLS FIRST,
employee.dept_id ASC NULLS FIRST
'else
wherePart += '
WHERE (COALESCE(employee.vehicle_number, 'Z'),
COALESCE(employee.dept_id, 9223372036854775807::Numeric)) >
(COALESCE(${last_vehicle_number}, 'Z'),
COALESCE(${last_dept_id}, 9223372036854775807::Numeric))
'
orderByPart = '
ORDER BY
employee.vehicle_number DESC NULLS LAST,
employee.dept_id DESC NULLS LAST
'
We have to explicitly cast the value into the type of column in coalesce
with it. And there we have our multiple column paginator with ability to tackle nulls as well.