Cursor based pagination with arbitrary ordering
If you’re already convinced you should use cursor based pagination over offset pagination and just want to know how to order data with cursor based pagination scroll down to Cursor based pagination
Recently I’ve been considering changing the structure of our paginated APIs to return and consume tokens, instead of offsets as I wanted to move towards cursor / keyset pagination, below is an overview of the problems with offset pagination and how to achieve cursor pagination when ordering by an arbitrary number of fields.
Offset Based Pagination
You’ve probably seen offset based pagination in the wild before, you make a query to an API, in the query you specify the page size or limit (which usually has some sensible max / default imposed by the API).
This approach has 3 major problems that can effect users
- Unexpected behaviour when data is deleted
- Unexpected behaviour when data is added
- Poor performance with large data sets
Lets take a look at an offset based pagination API:
example.com/api/books?limit=3
The above query would probably execute SQL a little like this:
If we look at the underlying table, we can imagine that this call would return the first 3 rows, as you can see from the table, the response is ordered by Published date, ascending:
After you’ve received your data you tell the api, with a skip or offset parameter, that you’d like the skip the first 3 records and receive the next 3
example.com/api/books?limit=3&skip=3
Which would return the second stack of results:
Problematic Deletes
So far so good right? Let’s talk about would happen if a row is deleted from the database before we make a call for the next 3 records, lets deleteA Fire Upon The Deep
from books database and call our api again:
example.com/api/books?limit=3&skip=6
Because each call to our paginated api is an isolated database query, it has no concept of what was returned in previous calls, as such if something was deleted from our table it can have detrimental effects for our user:
As you can see, even tho A Fire Upon The Deep
was in row 3 in our previous query and delivered to the user in the first call, when it was removed it pulled every record returned after it down one row, and as such, our user is now missing Sevensves
which was previously in row 7 and would have been included, but is now in Row 6.
Problematic Inserts
We get similar problems if we were to add a record before our current payload, lets restore A Fire Upon The Deep
and add H.G. Wells The Time Machine
to our database:
example.com/api/books?limit=3&skip=6
Now Pandora Star
was delivered to us in both the 2nd and 3rd pages because a new book was added to the beginning of the list, not ideal.
Problematic Performance
That last problem with Offset based pagination relates to the performance of Offset is most relational databases, other articles have done a better job at explaining this that I will, But when I ran some quick performance comparisons locally I did indeed notice using OFFSET
added an extra TOP
operation to the plan than when I used a cursor based approach.
Cursor Based Pagination
Cursor based pagination works by giving the user a way to identify the last record in the dataset they were querying, they can then return these identifiers back to the server when they make the request for the next page and the server can subsequently return items after that item in the results set.
Let’s look at a simple example first, using the same table as before, but ordering only based on the auto-incrementing Id field, the first query to our api is identical.
example.com/api/books?limit=3
The difference would be in the response, instead of only returning the data, we return the data and a next token of some sort (in this case, the nextToken is just the ID of the last record returned):
{
data: [
{id: 1, title: 'A Fire Upon the Deep', Published: '1992-4'},
{id: 2, title: 'Project Hail Mary', Published: '2021-5'},
{id: 3, title: 'Children of Time', Published: '2015-6'}
],
nextToken: 3
}
Now, on a subsequent query we can use the token provided to us to fetch the next page of data:
example.com/api/books?limit=3&nextToken=3
Then, instead of using OFFSET
in our database, we can just use a where clause:
Which would, predictably, return the next page:
What is noteworthy, our row counter starts form our first match as the first 3 rows in the table no longer satisfy the where clause id > 3
where 3 was our incoming cursor.
we don’t need to worry about records being added or removed form previous pages when using cursor based pagination
As previous records don’t satisfy the where clause for this kind of pagination, we don’t need to worry about records being added or removed form previous pages when using cursor based pagination, as they will have no effect on the data after the current counter.
To demonstrate, lets remove Project Hail Mary form our database and get the next page:
example.com/api/books?limit=3&nextToken=6
As you can see, Pandora’s Star
is still returned as the first item in the third page, even though a record in the previous page was removed.
Arbitrary ordering
The above is fine if we’re not ordering our results, but in the real world, we almost always want to order the results in our query, and often we need to order results by multiple fields.
While ordering is trivial when using an offset based pagination approach, when using a cursor based approach we run into problems. The API above relies on our records being ordered by their ID for our where clause ( id > nextToken
) to return the first unseen record.
Lets order our data by Published date:
If we were to use the same approach as above And were to take the first 5 records, the returned token would be 9
Using that in a where id > 9
in the subsequent query would clearly produce incorrect results:
example.com/api/books?limit=5&nextToken=9
The solution is to include all keys we use to order in our where clause
The solution is to include all keys we use to order by
in our where
clause, we want to exclude all records that are ‘smaller than’ any record we have already seen. This means that we need to add more information to our nextToken. lets get our first payload:
example.com/api/books?limit=3
{
data: [
{id: 11, title: 'The Time Machine', Published: '1985'},
{id: 4, title: 'The Moon is a...', Published: '1966-6'},
{id: 5, title: 'The Mote in g...', Published: '1974-11'}
],
nextToken: '5,1974-11'
}
Note we’ve also added the published date to the next token, Now In the call to get the second page, the user can pass us the nextToken we provided them with which we can subsequently decode and use:
example.com/api/books?limit=3&nextToken=5,1974-11
Which we can use in the following query to extract our next page:
// Extract the data form the nextToken, in the real world we'd
// probably base64 this, or possibly even encrypt it.
const [lastID, lastPublished] = nextToken.split(',');
If you’re using PostgreSQL
or MySQL
you can use the shorthand
WHERE (published, id) > (@lastPublished, @lastId)
Which will produce the same result.
Now when we request our next page, we will get the following result:
As the dataset is ordered by date ascending, the first part of our query handles the majority of cases, published > @lastPublished
This will prevent any dates we’ve already seen being included in the dataset.
As published
is not guaranteed to be unique, We need to handle when the page boundary ends in the middle of a group of identical published dates (as with the above example, the last item of page 1 and the first item of page 2 are both Nov 1974
) the second part of the where clause(published = @lastPublished and id > @lastId)
will ensure that if the current date being inspected is the same as the last date we returned, it is included.
Ordering on multiple fields
It should be noted that we can order on an arbitrary number of fields using this strategy, Lets say we wanted to add (or join) another field into our results, In our example we’ll change the way we’re ordering our books, and we’ll order on Average Ranking
first, then by Published Date, and finally by Id (we must always include a key that guarantees uniqueness when implementing keysSet pagination)
we must always include a key that guarantees uniqueness when implementing keysSet pagination
Our query will now look like this:
Which again, in MySql or PostgreSQL can simply be expressed as
WHERE (average_rating, published, id) > (@lastAverageRating, @lastPublished, @lastId)
The token we return’d would also need to include the last value for all three of these keys, so something like 5,33,2001-02
would be fine.
You can extends this pattern to cover as many fields as you want to order by, the order of these where clauses must be the same as the order of the ORDER BY
.
A quick note on joins, if you’re joining multiple tables, you must ensure that there is some combination of unique keys, if the join is one to many or many to many, consider using the unique keys of both tables in the ORDER BY
to achieve this.
And that about wraps it up. If you notice any errors in the examples above please do let me know in the comments.