Facebook’s developer page said it best:
“Cursor-based pagination is the most efficient method of paging and should always be used where possible.”
So what exactly is pagination? Pagination is the process of separating data into sets. Just like the pages of a book.
You’ve seen it used explicitly in your email inbox where you have to click on the “next” button to see the next set of 50 emails. And, more discreetly, on your Facebook newsfeed, when you scroll to the bottom of the page, and the next page magically loads like a never ending scroll.
The default version of pagination is called offset pagination and has been used for decades. But if you are using any website that deals with real-time data (and lots of it) like Facebook, Slack, Twitter, etc. you are likely seeing cursor pagination in action.
But just why are so many companies making the move from offset to cursors?
The Benefits of Cursors
Superior Real-Time Data Capabilities
Perhaps the biggest advantage of cursor pagination is its ability to handle real-time data effectively. This is because cursors do not require the data to remain static. That is to say, new items or rows can be added or removed, and each new page will still load correctly.
With default offset pagination, if the data is not static, one of two things will happen:
- the item will be skipped
- the item will be returned twice
No Skipped Data
Let’s take a look at a real life example of data being skipped. When you first land on the Facebook newsfeed, you immediately see the most recent 5 posts from your friends. You might see some Game of Thrones memes, some embarrassing photos posted by Sam while he was drunk, and a political rant. If you were to load the next page of data, then you would expect to see some new posts load: cat photos, Olivia’s Australia trip, and some inspirational Gary Vee posts. And if the original set of 5 items of data have not been altered in this time, you’ll get just what you expected on Page 2:
But let’s say that while you were still viewing Page 1, hungover Sam woke up, got wise, and decided to delete his embarrassing photos from last night. When you scroll to load more posts, suddenly you’re seeing a Page 2 that looks like:
Wait a sec! What happened to the cat photos?!
Unfortunately, offset pagination doesn’t care that the data has been modified. It’s just querying the database for the next set of 5 which it assumes will be located at an offset of 5. The database will get queried each time like this:
SELECT * FROM table ORDER BY timestamp OFFSET 0 LIMIT 5
SELECT * FROM table ORDER BY timestamp OFFSET 5 LIMIT 5
SELECT * FROM table ORDER BY timestamp OFFSET 10 LIMIT 5
SELECT * FROM table ORDER BY timestamp OFFSET 15 LIMIT 5
Where each line represents Page 1, Page 2, Page 3, Page 4, etc.
This means, you may never get to see those awesome cat photos! And worse, you would have no way of knowing that you missed them.
How cursor pagination works, is that rather than sending an offset parameter which acts like an index, it sends a cursor parameter which acts as a pointer to a specific record in the dataset to indicate where the last page left off. For example, this cursor might be a timestamp that tells the database that the last record returned had a timestamp of September 12, 4:02 PM and to return the next set of 5 items that happened before this timestamp.
SELECT * FROM table WHERE cursor > timestamp ORDER BY timestamp LIMIT 5
This way, it doesn’t matter that Sam deleted two embarrassing photos, we’d still get all the data previous to that timestamp including both cat photos. Thank god.
No Duplicated Data
In the second scenario, let’s say while you are looking at Page 1, Sam wakes up still drunk and decides to add another awesome photo from last night. By the time you want to view Page 2, offset’s going to have you seeing double.
Am I drunk now too?
No. You’re just seeing duplicated displayed data caused by the fact that the second page should really be starting at an offset of 6 instead of 5, now that 1 new record has been added.
This is probably the worst case scenario for your website because it’s immediately apparent to the user that something is terribly, terribly wrong.
Handling Big Data Sets Efficiently
Have you ever wondered how SQL’s built in OFFSET clause works? Me neither. Until I investigated Facebook’s claim that cursors are the most efficient form of pagination.
What I discovered is that an offset is simply the number of records the database should skip before selecting records.
That means it’s not just returning the next set of data you’re requesting, it’s scanning through all the previous data that comes before it. This becomes a huge problem when that offset number increases.
That fact didn’t fully compute in my brain until I ran the offset SQL query against the cursor SQL query on a dataset of roughly 7.3 million records.
Let’s zoom in on the actual user experience here. When a user navigates to the page showing the 7 millionth row:
With cursor pagination, when the user clicks next or previous to see a new page, it will take only 0.18 seconds to load.
With offset pagination, the same action will take 13 seconds! That means each page will take 13 seconds to load. That is like forever in computing years!
Now, what are the odds that a user will actually navigate to that page? It’s possible, but the short answer is: probably pretty slim.
Still, even before the 7 millionth record, the user will begin to notice a diminishing user experience since the time complexity of offset pagination is linear O(N) or more specifically, O(offset+limit). Whereas for cursor pagination, it is constant O(1), or more specifically O(limit).
So if you’ve ever wondered why the page loading seems to be getting slower the further back in history you go, it’s probably because you’re experiencing the inefficiency of offset pagination.
The Disadvantages of Cursors
So is offset pagination really dead?
Well, not necessarily. Before you start inviting your friends to the vigil, let’s explore some instances where offset might be better suited for your use. (Scoot over Rose! There’s room on that door.)
Facebook said cursor pagination should always be used where possible. But there are certainly cases where it is not possible, or at least not practical.
Limited Sort Features
Let’s say, you are tasked with building a user table that can sort on first name and last name. This could pose a problem for cursors because one requirement for the implementation is a unique sequential column (or columns) to sort by. Remember, with cursors, you have to be able to point to a specific place in the data set and say, I want records after this one record.
With this in mind, it makes sense that most cursor implementations are based on the timestamp column. Because it’s certainly sequential and if you track time to a granular enough level, one can reasonably assume the column is unique.
First names may be sequential, but it doesn’t scream unique. We all know at least 100 Johns, Michaels, and Megans. So your cursor might be pointing to a whole set of records rather than one unique record. So, based on how you implemented the cursor, you will end up skipping data or getting duplicated results. (Uh oh! Not this again.)
A better column to sort by, in the case of users, would be username or email, as we can assume these are both unique and sequential.
However, if the requirement is to support sort by first name or last name, then cursors might not be your best bet. You can try creating a unique column by concatenating the first and last name, or using a tuple of multiple columns, but this significantly slows down cursors to the point where it will be slower than offset. Because it turns out that concatenation and tuple comparison in a SQL statement both have a time complexity O(N), or more specifically O(whole set of data).
And in fact, the first page will load the slowest. This sounds weird and non-intuitive, but what is happening is that the SELECT statement is going to exclude more and more records the further out you go.
There are some tricks to get around this issue discussed here. (More on this phenomenon here.)
Cursors Can Be a Bit Trickier to Implement
While cursors are not necessarily difficult to implement, offset is super easy to implement. So if you’re looking for a quick way to implement pagination then offset is the way to go. Especially if that data is not real-time. No need to complicate things where it is not needed.
Having implemented cursors myself, I would say that it wasn’t too hard to get it working for 90% of cases. But there were definitely a lot more edge cases to consider than with offset. Again, not hard to fix, but generally not as straight forward as offset.
So recognize that it may take more time to implement cursors. And this is simply a trade off that you, your company, or your product managers will have to make.
Despite the minor setbacks when sorting by every column is necessary, when it comes to real-time data, cursor pagination is king.
Now it’s starting to make sense why companies like Facebook are making such bold statements about cursor pagination. And it may be time to ask yourself if your website could benefit from the power of cursor pagination.
If you are convinced that cursors are the best thing since sliced bread, here is an article explaining how to implement cursor based pagination in your own website!