MongoDB Pagination, Fast & Consistent

Mosius
Mosius
Oct 16, 2020 · 8 min read
Image by gOrk barrette from Pixabay

When it comes to pagination, the simplest method that comes to mind is the skip-limit, but it is not always the best solution!

The Problem with the skip-limit

Although theskip-limit is suitable for some applications, it suffers from 2 major problems

1st. Poor Performance

As mentioned in MongoDB docs, it requires scanning all the previous docs before reaching the desired one, so it becomes slower as the offset increases. So the response times decline as page numbers increase.

2nd. Data Inconsistency

The problem occurs meanwhile the user is exploring data and any doc being added or deleted simultaneously in the visited pages. it causes shifts in data which means we may face duplicated docs or not fetching some docs in the result. This problem is more likely to occur when we have a real-time system such as a feed.

What is Keyset Pagination?

The overall idea is to retrieve the next pages using the previous pages that have been fetched. In another word, you will tell what the last fetched item is and get the next page after that item. Easy, hun…?!

Any Downside?

By using the keyset method, we lose the ability to randomly access the pages. Simply saying, we actually can’t get page 3 without getting page 2 and getting page 2 without getting page 1. But the good thing is we don’t really care about that in most applications.

Let’s Explore Pages

Let’s use mongo-shell to find out the general idea. Then we go on by implementing a sample function in Python and JavaScript for more practical usage.

Data Preparation

I’ve inserted about 5k docs in the users collection with name, birth and height fields.

> db.users.count()
4940
> db.users.findOne()
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b68c"),
"name" : "Ardra",
"birth" : ISODate("1970-11-04T12:03:32.988Z"),
"height" : 159
}

Fetching 1st & 2nd Page

We can fetch the first page as we did before. I’ve also limited each page to 3 docs for sake of simplification.

> db.users.find().limit(3).pretty(){
"_id" : ObjectId("5f7f1f7d67f67ddaa622b68c"),
"name" : "Ardra",
"birth" : ISODate("1970-11-04T12:03:32.988Z"),
"height" : 159
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b68d"),
"name" : "Thea",
"birth" : ISODate("1985-05-16T20:41:33.861Z"),
"height" : 162
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b68e"),
"name" : "Selia",
"birth" : ISODate("1976-05-12T07:01:39.035Z"),
"height" : 188
}

To fetch the second page we can use a natural feature of the ObjectId which makes it always incremental. So we can fetch the next page just by fetching the documents that their _id is greater than the last doc on the first page.

> db.users.find({ _id: { $gt: ObjectId("5f7f1f7d67f67ddaa622b68e") } }).limit(3).pretty(){
"_id" : ObjectId("5f7f1f7d67f67ddaa622b68f"),
"name" : "Ellette",
"birth" : ISODate("1995-03-02T22:58:04.018Z"),
"height" : 174
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b690"),
"name" : "Odilia",
"birth" : ISODate("1975-04-24T23:08:09.316Z"),
"height" : 123
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b691"),
"name" : "Brandais",
"birth" : ISODate("1992-01-12T23:02:58.313Z"),
"height" : 184
}

That’s the basic idea!

Sorting and Tiebreaker

In many applications, we want to sort the query result based on some specified field. In that case, where we are sorting the result based on a non-unique field, there is an important point we should take care of. Consider we want to sort the result based on the user’s height.

> db.users.find().sort({ height: 1, _id: 1 }).limit(3).pretty(){
"_id" : ObjectId("5f7f1f7d67f67ddaa622b6a4"),
"name" : "Carol-jean",
"birth" : ISODate("1981-11-30T18:34:29.613Z"),
"height" : 121
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b791"),
"name" : "Kippie",
"birth" : ISODate("1991-03-06T22:05:31.554Z"),
"height" : 121
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b795"),
"name" : "Pru",
"birth" : ISODate("1995-10-21T19:44:33.719Z"),
"height" : 121
}

To get the second page, if we only try querying users that their height is greater than 121 like we did before with _id, we would probably skip some entries, as their height is equal to 121 but they haven’t been on the first page. Handling this problem requires a unique field (like _id in most cases) to act as a tiebreaker, so we can rely on it where the age values are equal.

> db.users.find({
$or: [
{ height: { $gt: 121 } },
{
height: 121,
_id: { $gt: ObjectId("5f7f1f7d67f67ddaa622b795") }
}
]
}).sort({
height: 1,
_id: 1
}).limit(3).pretty()
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b7dd"),
"name" : "Jonis",
"birth" : ISODate("1971-03-19T00:19:56.970Z"),
"height" : 121
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b7e0"),
"name" : "Wendye",
"birth" : ISODate("1984-08-27T09:44:45.319Z"),
"height" : 121
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622b7e3"),
"name" : "Wilie",
"birth" : ISODate("1974-09-15T10:51:23.528Z"),
"height" : 121
}

So far, we’ve almost covered many applications and we can confidentially include any condition in our queries to get a fast and consistent result, but…

There’s an $or

In some cases where we need to declare an $or condition in the query’s root, we have to make sure that both the query and the pagination query are getting satisfied! For instance, imagine we want to find users whose names are shorter than 3 characters, or their names start with the letter X.

> db.users.find({
$or: [
{ $expr: { $lt: [{ $strLenCP: "$name" }, 3] } },
{ name: { $regex: /^X.*/i } }
]
}).sort({
height: 1,
_id: 1
}).limit(3).pretty()
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622c0ae"),
"name" : "Di",
"birth" : ISODate("1988-03-27T23:22:10.843Z"),
"height" : 121
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622c58a"),
"name" : "Jo",
"birth" : ISODate("1997-05-07T09:44:10.037Z"),
"height" : 126
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622c3d6"),
"name" : "Xylia",
"birth" : ISODate("1971-11-30T20:43:15.103Z"),
"height" : 127
}

To fetch the next page, we will put both $or operators in an $and operator.

> db.users.find({
$and: [
{
$or: [
{ $expr: { $lt: [{ $strLenCP: "$name" }, 3] } },
{ name: { $regex: /^X.*/i } }
]
},
{
$or: [
{ height: { $gt: 127 } },
{
height: 127,
_id: { $gt: ObjectId("5f7f1f7d67f67ddaa622c3d6") }
}
]
}
]
}).sort({
height: 1,
_id: 1
}).limit(3).pretty()
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622c33f"),
"name" : "Ki",
"birth" : ISODate("1994-05-13T04:20:47.427Z"),
"height" : 132
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622bb3d"),
"name" : "Xenia",
"birth" : ISODate("1975-07-31T18:49:55.730Z"),
"height" : 133
}
{
"_id" : ObjectId("5f7f1f7d67f67ddaa622bdfc"),
"name" : "Vi",
"birth" : ISODate("1979-09-21T19:46:21.595Z"),
"height" : 133
}

Checking Performance & Benchmark

Now let’s make these 2 methods in use and compare their performance in different cases. I also seeded the users collections with about 50B docs to make results look more like a real-world scenario.

Basic Pagination

Query: None / Sort: None / Limit: 25

With no query and no sorting, although in the beginning, the skip-limit seem to respond a bit faster, we can see its response time grows linearly in further pages. as we read before that's because it has to count(O(N)) from the beginning of the collection for each page. On the other hand, we can see keyset doesn’t really depend on the page number, as it can directly(O(1)) find the last fetched document in the collection(because of indexed _id) to retrieve the next page instantly.

Sorted Queries Pagination

Before we can continue, we need to create indexes for the users fields to be able to query them more efficiently. besides that, indexes also help us to speed up sorting our queries. Here I will create indexes for both height and birth fields.

db.users.createIndex({ height: 1 })
db.users.createIndex({ birth: 1 })
Query: { birth: { $gt: ISODate(“1998–01–01”), $lt: ISODate(“2000–01–01”) } } / Sort: { birth: 1} / Limit 25

As you may notice, after applying a query with sort, the response time has increased a little bit, but its shape is relatively the same.

Worse Performance and Fix

In a situation, where we need to sort the query on a field that its values are not identical with high occasions rate(height in our case), the keyset pagination performs much worse than the skip-limit.

Query: { height: 150 } / Sort: { height: 1 } / Limit: 25

But why? The reason is that the database doesn’t use the _id index when the height is the same in all the docs. So it begins to scan(O(N)) from the beginning of the list every time it wants to find the target _id (you can read more about handling collision in a hash table).

To fix the problem, we can create a compound index on both _id and height fields.

db.users.createIndex({ _id: 1, height: 1 })
Query: { height: 150 } / Sort: { height: 1 } / Limit: 25

And again, we have an excellent result. just note that creating more indexes causes poorer performance on inserting and it also occupies more space on disk or memory.

Show Me the Code

So far we have explored the data using mongo-shell . but now let’s implement a function where we can pass our original query to get a paginated query.

Python Implementation

We can use this function by passing the query to the first parameter, and the sort option as a list to the second parameter.

query = {'height': {'$gt': 150}}
sort = ['birth', -1]
limit = 10
query, next_key_fn = generate_pagination_query(query, sort)
users = list(db.users.find(query).limit(limit).sort([sort]))
next_key = next_key_fn(users)

We can then fetch the next page by passing the next_key to the third parameter of the function.

query, next_key_fn = generate_pagination_query(query, sort, next_key)
users = list(db.users.find(query).limit(limit).sort([sort]))

JavaScript Implementation

We can use this function by passing the query to the first parameter, and the sort option as a list to the second parameter.

const query = { height: { $gt: 150 } };
const sort = ['birth', -1];
const { paginatedQuery, nextKeyFn } = generatePaginationQuery(query, sort);
const users = await User.find(paginatedQuery).limit(3).sort([sort])
.toArray();
nextKey = nextKeyFn(users);

We can then fetch the next page by passing the nextKey to the third parameter of the function.

const { paginatedQuery, nextKeyFn } = generatePaginationQuery(query, sort, nextKey);
const users = await User.find(paginatedQuery).limit(3).sort([sort])
.toArray();

Final Word

Having algorithms that perform independently by the data size is beneficial value in products that need to scale up fast. Pagination is one of the basic stuff we can optimize to create such a product.

Besides the algorithm we may use, it is important to know what queries and sorting we require, so we can define appropriate indexes and enhance the response time.

Finally, I wish you enjoyed the article and find its information useful to use in your production-level applications.

Thank you for reading.

The Startup

Get smarter at building your thing. Join The Startup’s +789K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +789K followers.

Mosius

Written by

Mosius

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +789K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store