A scalable solution to ordering data by priority or rank

It seems like such a simple request at first, “I want data to be ordered by rank or priority, and I want users to be able to re-order the data based on which is most important”. This is an extremely common scenario in app development. You will find this problem everywhere from Todo Lists and task management systems, to complex and specific use-cases. The solution might seem simple at first, depending on your language/db solution, there might even be native support for a priority queue, or perhaps an array could function here. The solution becomes much more complex when it has to scale.

In this article, we are going to go over several unscalable ways of handling this problem as well a scalable way of handling it. For the purpose of this problem, we are going to assume that we have a Todo list with 10 million items all ranked by their priority (scroll down to the solution section to skip to the meat). Obviously this is an unrealistic example, but the purpose forces us to come up with a solution that scales.

Naive Approaches

There are a number of naive approaches. They might be appropriate solutions, but will generally not scale well. The commonality that all of these approaches share is that changing the priority/rank of a single Todo item affects all other todo items before it can be returned to the user.

(1) Priority Field:

This naive approach adds an integer to each Todo item. This is ordered from 1 — n, 1 being the lowest priority and n being the highest priority.

Pros: By ordering the priority in reverse, it makes adding Todo items very easily. Simply find the total number of Todos and add one. The new Todo item will now be set to the very top of the list without affecting any other todo items

Cons: Changing the rank of a todo is very expensive. In the list of 10 million todos, if I move my brand new todo (priority = 10000000) to priority = 1, I now have to update 10 million entries just to increment each priority by 1. Changing the priority cannot be an atomic interaction and will always affect other cards causing scalability concerns, race conditions, write locking (depending on the database), and very complicated error management

(2) Array

This Naive approach is very similar to the last one, in that it makes it very easy to add a new todo with a priority, but changing the priority is expensive as it necessitates all items in the array to be incremented/decremented before an todo is moved.

Pros: You can rely on the natural data structure of the language or database to handle reordering, splicing, pushing and unshifting. Working with arrays is one of the first data structures we learn and as a result it is familiar territory

Cons: As already mentioned, re-ordering an array of 10 million items, unshifting or splicing will be very expensive operations even if they are handled under the hood. The biggest con of this approach is applying it to the actual todo items. We now have two separate data structures, one that manages ordering and the other that manages data. This creates a lot of complexity in the application code to provide the client with a clean, paginated and ordered list Todos.

SOLUTION

Now let’s get to the exciting stuff. So what is the scalable solution to this problem? The approach has one major requirement: Adding, removing, or updating a priority should only affect a single todo item. How is this possible? By utilizing floats.

Abstract away the float:

If you are using a strictly typed language, you will want to add a float64 to your todo items (more info on why we are using a 64bit float later). This float64 should be abstracted away from the client; the client does not need to know about this number. Unlike the naive solutions, here #1 is the highest priority. When the client wants to update the priority, they simply provide the integer of where they want to move the todo list. If they want to move a todo item from priority #500 to priority #3, they would simply provide the endpoint /todos/:todoID/priority/:priority where :priority is an integer of 3. The float should be 100% abstracted away from the end user

Setting a Todo to #1 (edge case):

When creating a new todo, or moving a todo to the #1 position, you simply retrieve the very first card (ordered by priority), take the priority and divide it by 2

ex. 
todo.priority = 1
todo2.priority = todo1.priority/2
todo2.priority == .5

Without updating any other todo, we have set the new priority to the first position of the sorted list. Each new todo will divide the first priority by 2

ex.
todo2.priority = 0.5
todo3.priority = todo2.priority/2
//todo3.priority == 0.25

Setting a Todo to the bottom of the list (edge case):

When moving a todo to the bottom, if the priority number provided by the client is equal to the total count of todo items, then we know it is the final item on the list. We simply retrieve the very last card of the sorted list and add 1 (let’s say there are only 10 items on the todo list to make this example succinct)

ex. 
todo10.priority = 7.5
todo3.priority = todo10.priority +1
//todo3.priority == 8.5

Setting a Todo in the middle of the list:

The meat of the solution. To update the Todo item without affecting any other todos, we must find the surrounding 2 todos of the new priority number. To do this we must determine how many todos to skip in the database query. The amount to skip will be priority-2 if the todo is moving up or priority -1 if the todo is moving down

I solve this in the following way (pseudocode).

// pseudocode
// I want the two surrounding todos, but if the todo is decreasing, 
// I'll want to shift this by one
skip = priority - 2
// because of previous comment, I take the surrounding 3 todos
limit = 3
todos = retrieveTodos(limit, skip)
firstValue = todos[0]
secondValue = todos[1]
// if the todo is decreasing its position, then skip the first item
if todo.priority <= todos[0].priority {
firstValue = todos[1], secondValue = todos[2]
}
newPriority = firstValue +
((secondValue - firstValue) / 2)
// desired priority = 6
// firstValue = 5 secondvalue = 6
// 5 + (6-5) = 5.5

Overflow Constraints

If you’re an experienced engineer, you’ve probably been biting your tongue about the obvious shortcoming of this approach. A float64 can only hold 16 digits. Eventually, the floating number will overflow and cause a runtime error, as well as breaking the priority system. If you merely add 30–40 todos in rapid succession without moving any of them, you will probably hit this limitation. That is why it is extremely important to add a check every single time a new priority is created. This check should convert the float into a string and check the length of the string. If the length of the string is 15 or more, then create an async event that causes the list to be re-scaled to the nearest integer. This should occur completely separate from the user response, utilizing whatever async system you have in place to handle async or cron jobs on your servers. Again the fact that there are floating point numbers handling priority should be completely abstracted from the Front end code, as far as they are concerned, the only thing that is important is the final ordering of the todo list. Depending on your database solution, this re-ordering should be handled in the background in a non-locking fashion. Race conditions will not be affected by users continuing to re-order the todos while this is running in the background as this solution does not affect any other cards.

Priority on a filtered list (relative priority rank)

So we’ve completed a fully scalable solution to the problem of a priority list, but now product wants us to extend it. “Now I want people to be able to filter their todo lists by hashtags, and then re-order them, when they remove the filter, that ordering should translate to the non-filtered list”. This increases the complexity of the request dramatically, and it is a very common problem as well. Any task management system needs to be filtered by a user, and the priority should be able to be updated when filtered by that user. But how should it translate when the filter is removed?

The solution is quite simple, extend the endpoint to accept a todo ID. When parsing the :priority from the url /todos/:todoID/priority/:priority, check to see if the :priority is a todoID, if it’s not, then check if it’s a valid integer.

If the :priority is a todoID, it is our responsibility to convert that to the actual non-filtered priority. We do this by taking the total count of the sorted list with a constraint of the :priorityID’s priority float. ex:

//psuedocode
extractPriority(p params) int{
if isInteger(p.priority) {
return atoi(p.priority) //atoi converts a string to an int
}
 todo = getTodo(priorityId)
sort = "priority"
 // gets the total count of the todo list that are <= todo.priority
// which equates to the desired non-relative priority number
return getTodoCount(sort, todo.priority)
}
//(continue with the rest of the solution)

Please post any questions, improvements or suggestions. Have you solved this problem in a different way? Let’s hear it