All Roads Lead to Pagination

Michael Azimov
Fundbox Engineering
4 min readJan 9, 2022
Photo by BENCE BOROS on Unsplash

We all have probably been there. You have an enormous table, and you have to go over all its data for some reason. You are a pretty decent software engineer, so you understand that querying it as a whole will blow up your memory usage. But what are the alternatives?

I'm part of the frameworks team at Fundbox. Our job is to help the developers in the R&D Department by building applicative and system frameworks, core services, and common packages. A few weeks ago, my team and I decided to tackle the very same question.
We wanted to give our developers a clean and straightforward way to query SQL data in pages instead of retrieving all data. The journey for a good solution was bumpy and packed with roadblocks, and in this blog post, I will try to walk you through it. Buckle up :)

The three roads

Photo by Nick Fewings on Unsplash

Road #1 — Limit & Offset

This is probably the most common way to implement pagination in SQL.
You add LIMIT and OFFSET to your query (Along with an ORDER_BY clause), and that's it — you're good to go. Is it that simple, though?

SELECT * FROM some_table 
ORDER_BY id
LIMIT 10000
OFFSET 100000

I won't dive deep into this subject, but long story short — you probably made things worse.
If you have large tables, and by large, I mean 100k rows and more, using OFFSET will result in inefficiency. The reason is rooted in how the OFFSET clause works — it scans all rows of the table, one-by-one, for each page queried.
** If you want to dive deeper, there is a great blog post by Oliver de Cramer that explains this thoroughly.

Well, this is not a good solution for us. So let's see where we end up on road #2.

Road #2 — Window Functions

A window function is a built-in SQL Function that does some operation on a particular set of rows (the "window") from the table. The collection of rows is defined by the OVER clause, which takes some SQL clauses and uses them to determine a "window" of rows.

The function we can use for pagination is ROW_NUMBER().
ROW_NUMBER() does just what it sounds like—displays the number of a given row. It starts with 1 and numbers each row according to the ORDER BY part of the window statement.
So for pagination purposes, we can do:

SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS row_number   
FROM some_table
WHERE row_number > page_num * num_in_page
LIMIT num_in_page

Window Functions are very powerful because they are efficient and pretty easy to do, so they should be your first option if you want to do pagination.
Unfortunately, not all DBs support these functions. In our case, our DB supports them but only from a later version, and an upgrade is not on the horizon.
Bummer. I guess we need to go to road #3.

Road #3 — Manual seek function

A manual seek function is just a cool (or not) name for pagination using WHERE, ORDER_BY, and LIMIT clauses.
Basically — you do the pagination manually. Let's see the most simple example:

SELECT *
FROM some_table
WHERE id > last_id
ORDER_BY id
LIMIT num_in_page

As you can see, the critical factor to building this query is the last_id parameter that represents the last row queried in the previous page (If it's the first page, there is no need for a WHERE clause).

The most significant advantages of this method are that it fully utilizes your indexes (assuming you have them, if not — go on and add them) and that it's "vanilla" SQL — supported by all DBs and DB Engines.

The downside is that it will require developers to write a lot of code to implement this, and if you recall, my team's objective was to give our developers a clean and straightforward way to query SQL data in pages instead of retrieving all data. While the "clean" part is debatable, this solution is undoubtedly not simple. But what if we can make it simple?

The road taken

So, the goal of this post was to outline the major solutions for doing SQL pagination. The three options we discussed are all valid but should be applied mindfully after reviewing the pros and cons of each option.

We decided to take road #3 and build a Manual seek function implementation, but we want to do it in a way that would be simple and easy to use by the developers. So if you are interested in hearing how we managed to do that, stay tuned for the next post in which we will deep dive into our solution.

--

--