Cursors vs Offsets in APIs

Lorenzo Aiello
The Startup
Published in
3 min readDec 10, 2019

When you write an API you’ll soon reach a decision point around how to format your read endpoints. Singleton read endpoints are easy because they return… well, a single record. Non-singleton read endpoints become a bit more challenging because at some point there will be too much data to return in a single response.

That’s where we start talking about some sort of pagination. Most people are familiar with the concept and it’s relatively easy to do with static datasets because you can use a page number combined with a limit on the number of records returned to get up and running quickly. This is called “offset-based pagination” and has been around forever. In fact, most databases support it natively with LIMIT and OFFSET keywords that can be used in nearly all SELECT queries.

Below is an example of how an offset-based API would respond.

GET https://my.domain/resources?page=10&size=10 { "data": [...], "meta": { "total_records": 250, "current_page": 10, "total_pages": 25 }, "links": { "first": "https://my.domain/resources?page=1&size=10", "prev": "https://my.domain/resources?page=9&size=10", "next": "https://my.domain/resources?page=11&size=10", "last": "https://my.domain/resources?page=25&size=10" } }

As you can see from the example, the inputs and outputs amp pretty closely to the database queries that are likely happening under the hood.

Once we start changing the underlying dataset this can exponentially become a problem the more frequently it changes because it doesn’t account for data movement and can lead to either redundant records on subsequent API calls, or worse: omit data altogether because it moved between “pages” in between API calls.

To solve this problem, “cursor-based pagination” was developed to instead provide a “cursor” referencing an actual record that you can retrieve data around it allowing more consistent results when working with data that has a high rate of change.

In cursor-based pagination, at least one column with unique sequential values is used as the cursor. We combine that with a count to limit the number of results and together they form a composite bookmark in the dataset.

Below is an example of how a cursor-based API would respond (using UUIDs as the unique column).

GET https://my.domain/resources?after=098d5b15-e6fd-4e58-a04f-e8ddb4109dbc&limit=10 { "data": [...], "meta": { "from": "098d5b15-e6fd-4e58-a04f-e8ddb4109dbc", "to": "b6236c52-c0fc-4cd0-a4fb-2540a2befb01" }, "links": { "first": "https://my.domain/resources?after=35592d65-5b1b-490a-bd14-09b3489c17bf&limit=10", "prev": "https://my.domain/resources?after=3b9ef2286-acf1-4c06-9ab2-9dca4e26dbe3&limit=10", "next": "https://my.domain/resources?after=b6236c52-c0fc-4cd0-a4fb-2540a2befb01&limit=10", } }

In this example, you can see that the only thing that really matters is the last record in the currently returned dataset because it marks the beginning of the next request, regardless of where it exists in the larger dataset.

So why not always use cursor-based pagination if its so great? Cursor-based pagination takes a bit more effort to implement correctly and can be marginally slower when compared to offset-based pagination but gives the user a much better experience and API behavior when the underlying data changes.

Originally published at https://lorenzo.aiello.family on December 10, 2019.

--

--

Lorenzo Aiello
The Startup

I am cloud engineer and developer who practices DevOps while helping to innovate solutions to new and existing challenges.