Cursor based pagination with arbitrary ordering

George Gray
8 min readOct 2, 2022

If you’re already convinced you should use cursor based pagination over offset pagination and just want to know how to order data with cursor based pagination scroll down to Cursor based pagination

Recently I’ve been considering changing the structure of our paginated APIs to return and consume tokens, instead of offsets as I wanted to move towards cursor / keyset pagination, below is an overview of the problems with offset pagination and how to achieve cursor pagination when ordering by an arbitrary number of fields.

Offset Based Pagination

You’ve probably seen offset based pagination in the wild before, you make a query to an API, in the query you specify the page size or limit (which usually has some sensible max / default imposed by the API).

This approach has 3 major problems that can effect users

  • Unexpected behaviour when data is deleted
  • Unexpected behaviour when data is added
  • Poor performance with large data sets

Lets take a look at an offset based pagination API:

example.com/api/books?limit=3

The above query would probably execute SQL a little like this:

Example of sql used for offset based pagination

If we look at the underlying table, we can imagine that this call would return the first 3 rows, as you can see from the table, the response is ordered by Published date, ascending:

Annotated table describing rows returned from our first paginated query

After you’ve received your data you tell the api, with a skip or offset parameter, that you’d like the skip the first 3 records and receive the next 3

example.com/api/books?limit=3&skip=3

Which would return the second stack of results:

Annotated table describing rows returned from our second paginated query

Problematic Deletes

So far so good right? Let’s talk about would happen if a row is deleted from the database before we make a call for the next 3 records, lets deleteA Fire Upon The Deep from books database and call our api again:

example.com/api/books?limit=3&skip=6

Because each call to our paginated api is an isolated database query, it has no concept of what was returned in previous calls, as such if something was deleted from our table it can have detrimental effects for our user:

Annotated table describing rows returned from our third paginated query when a previous row has been deleted

As you can see, even tho A Fire Upon The Deep was in row 3 in our previous query and delivered to the user in the first call, when it was removed it pulled every record returned after it down one row, and as such, our user is now missing Sevensves which was previously in row 7 and would have been included, but is now in Row 6.

Problematic Inserts

We get similar problems if we were to add a record before our current payload, lets restore A Fire Upon The Deep and add H.G. Wells The Time Machine to our database:

example.com/api/books?limit=3&skip=6
Annotated table describing rows returned from our third paginated query when a row has been inserted before our current offset

Now Pandora Star was delivered to us in both the 2nd and 3rd pages because a new book was added to the beginning of the list, not ideal.

Problematic Performance

That last problem with Offset based pagination relates to the performance of Offset is most relational databases, other articles have done a better job at explaining this that I will, But when I ran some quick performance comparisons locally I did indeed notice using OFFSET added an extra TOP operation to the plan than when I used a cursor based approach.

Cursor Based Pagination

Cursor based pagination works by giving the user a way to identify the last record in the dataset they were querying, they can then return these identifiers back to the server when they make the request for the next page and the server can subsequently return items after that item in the results set.

Let’s look at a simple example first, using the same table as before, but ordering only based on the auto-incrementing Id field, the first query to our api is identical.

example.com/api/books?limit=3

The difference would be in the response, instead of only returning the data, we return the data and a next token of some sort (in this case, the nextToken is just the ID of the last record returned):

{
data: [
{id: 1, title: 'A Fire Upon the Deep', Published: '1992-4'},
{id: 2, title: 'Project Hail Mary', Published: '2021-5'},
{id: 3, title: 'Children of Time', Published: '2015-6'}
],
nextToken: 3
}
Annotated table describing rows returned from our first api call using cursor based pagination

Now, on a subsequent query we can use the token provided to us to fetch the next page of data:

example.com/api/books?limit=3&nextToken=3

Then, instead of using OFFSET in our database, we can just use a where clause:

simple cursor based pagination sql

Which would, predictably, return the next page:

Annotated table describing rows returned from our second api call using cursor based pagination with a cursor

What is noteworthy, our row counter starts form our first match as the first 3 rows in the table no longer satisfy the where clause id > 3 where 3 was our incoming cursor.

we don’t need to worry about records being added or removed form previous pages when using cursor based pagination

As previous records don’t satisfy the where clause for this kind of pagination, we don’t need to worry about records being added or removed form previous pages when using cursor based pagination, as they will have no effect on the data after the current counter.

To demonstrate, lets remove Project Hail Mary form our database and get the next page:

example.com/api/books?limit=3&nextToken=6
Annotated table describing rows returned from our third api call using cursor based pagination with a cursor demonstrating deleting rows does not cause problems with latter pages

As you can see, Pandora’s Star is still returned as the first item in the third page, even though a record in the previous page was removed.

Arbitrary ordering

The above is fine if we’re not ordering our results, but in the real world, we almost always want to order the results in our query, and often we need to order results by multiple fields.

While ordering is trivial when using an offset based pagination approach, when using a cursor based approach we run into problems. The API above relies on our records being ordered by their ID for our where clause ( id > nextToken ) to return the first unseen record.

Lets order our data by Published date:

The example table we’ve been using, ordered by Published

If we were to use the same approach as above And were to take the first 5 records, the returned token would be 9 Using that in a where id > 9 in the subsequent query would clearly produce incorrect results:

example.com/api/books?limit=5&nextToken=9
Annotated table describing rows returned when using an inappropriate cursor for ordered data.

The solution is to include all keys we use to order in our where clause

The solution is to include all keys we use to order by in our where clause, we want to exclude all records that are ‘smaller than’ any record we have already seen. This means that we need to add more information to our nextToken. lets get our first payload:

example.com/api/books?limit=3
Annotated table describing rows returned when using an appropriate cursor for ordered data.
{
data: [
{id: 11, title: 'The Time Machine', Published: '1985'},
{id: 4, title: 'The Moon is a...', Published: '1966-6'},
{id: 5, title: 'The Mote in g...', Published: '1974-11'}
],
nextToken: '5,1974-11'
}

Note we’ve also added the published date to the next token, Now In the call to get the second page, the user can pass us the nextToken we provided them with which we can subsequently decode and use:

example.com/api/books?limit=3&nextToken=5,1974-11

Which we can use in the following query to extract our next page:

// Extract the data form the nextToken, in the real world we'd
// probably base64 this, or possibly even encrypt it.
const [lastID, lastPublished] = nextToken.split(',');

If you’re using PostgreSQL or MySQL you can use the shorthand

WHERE (published, id) > (@lastPublished, @lastId)

Which will produce the same result.

Now when we request our next page, we will get the following result:

Annotated table describing rows returned when using an appropriate cursor for ordered data with rows keyed on which where clauses omit or include them,

As the dataset is ordered by date ascending, the first part of our query handles the majority of cases, published > @lastPublished This will prevent any dates we’ve already seen being included in the dataset.

As published is not guaranteed to be unique, We need to handle when the page boundary ends in the middle of a group of identical published dates (as with the above example, the last item of page 1 and the first item of page 2 are both Nov 1974) the second part of the where clause(published = @lastPublished and id > @lastId) will ensure that if the current date being inspected is the same as the last date we returned, it is included.

Ordering on multiple fields

It should be noted that we can order on an arbitrary number of fields using this strategy, Lets say we wanted to add (or join) another field into our results, In our example we’ll change the way we’re ordering our books, and we’ll order on Average Ranking first, then by Published Date, and finally by Id (we must always include a key that guarantees uniqueness when implementing keysSet pagination)

we must always include a key that guarantees uniqueness when implementing keysSet pagination

Our query will now look like this:

Which again, in MySql or PostgreSQL can simply be expressed as

WHERE (average_rating, published, id) > (@lastAverageRating, @lastPublished, @lastId)

The token we return’d would also need to include the last value for all three of these keys, so something like 5,33,2001-02 would be fine.

You can extends this pattern to cover as many fields as you want to order by, the order of these where clauses must be the same as the order of the ORDER BY.

A quick note on joins, if you’re joining multiple tables, you must ensure that there is some combination of unique keys, if the join is one to many or many to many, consider using the unique keys of both tables in the ORDER BY to achieve this.

And that about wraps it up. If you notice any errors in the examples above please do let me know in the comments.

--

--