Using Tree Structures with Pagination to build Folder Navigation

Meryl Dakin
Frame.io Engineering
10 min readAug 6, 2021

This is the story of how we built out the API to support navigation through a folder tree at Frame.io. We’ll be identifying some dead ends (so you don’t have to!) and one of our solutions.

Frighteningly complex.

This post will focus on explaining the overall approach in a way that can be broadly applied to a file navigation problem, excluding query specifics. In our case, the data is not easily accessible and the end feature is intended to perform fluidly.

To illustrate the structure and experience, we’ll use screenshots of Mac’s Finder window:

The final product (if it were already built and lived inside all of our Macs)

What we need

  1. Particular Data
  2. About all the folders in a project
  3. In order of depth

We have a table in our database that contains a LOT of data about our folders. But in order to show a user their folders in a project navigation panel, we don’t need all the data about this folder, just pieces — the name of the folder, its parent, if it’s private to the current user, etc

We need that data about all of the folders in the project that the user might want to see. We don’t know immediately what they’ll want to navigate to, so we have to be smart about how we get it.

To load it properly, we want to sort it in order of depth of the project. We’ll see how that works later.

What a user expects

As a user, I have two major expectations:

  1. I can expand my folders very quickly to get to a deeply nested folder
  2. I can scroll down quickly to see folders that weren’t in my viewport when I first loaded the page

Why is that hard?

1. Expense: Complicated relationships = Slow Query 👬👭👫

We have some pretty complex relationships between our tables because of the way we’re handling access and privacy in our feature. We won’t go into depth about how that’s structured, but suffice to say this query is not going to be fast. We’d rather not cache it because we need it to be very accurate every time we request it, and we don’t want to maintain that accuracy in a cache if we don’t have to. So we are planning for this to be slow.

2. Size : Many folders = Huge payload 🐋

We also have to think about how much data we’re sending to the front end. This graph is showing an approximation of how many folders each project might have:

Thankfully for the majority of projects, we can see that they generally have less than 100 folders, which is nothing for us to send over. However, we have to think about those projects that have up to 76k folders in a project, and everything in between. Sending that much in one payload would slow the browser to a crawl. Even when we get to 1000 folders we’re going to see a hit to animation performance because it’s so much data that it has to ingest.

So how do we get around these issues?

We can manage expense and size through pagination — sending this data in chunks — so that we can maintain our Fluid UI standards.

Paginating a tree: some feelings

But how can we paginate a tree?

Here are some thoughts I found on UI forums when I was first looking into how other people do this:

Basically, a lot of “don’t do this” and not very much “if you must, here’s a solution.”

But indeed we have to imagine that at any level, there are an infinite possible number of folder children, so pagination becomes very complicated when you start trying to slice it up in any direction.

At first, we tried to think of getting the data in “vertical slices” like this:

Call # 1
Call # 2

Thankfully, our data is stored in such a way that this is easy for us to pull out. So we thought we could send over only what would be in a user’s viewport, and then update it as they scrolled or expanded.

However, this quickly (lol not really) became obvious that it would be far too complex for the front end to keep track of all the end cursors (we’re using cursor-based pagination) for the various permutations of a slice we could deliver.

We explored a few other options as well:

  • Caching the tree in Redis — may revisit this option, but at this point we’d like to avoid it if possible so that we don’t incur more complexity in maintaining the cache
  • Storing only the necessary data in Elasticsearch — this would have been really useful in speeding up the query, but it does incur a bit of lag time. It would also be extra time to reindex the folders if anything changes about the structure or access. We’d also like to avoid utilizing Elasticsearch for features that are core to the product use so that we can minimize dependencies.
  • Skeleton tree with hydration — in the initial call send over the bare bones of what needs to be there, ids and parent ids only, and then let the front end request more as needed. However pulling out this data would still be a slow operation, there may STILL be too much to send in the requisite amount of time, and we’re still needing to hydrate each parent which incurs loading time for every folder.

And finally, we explored the most basic solution: requesting data for each parent folder, one at a time.

Pros

  • Predictable pagination for every parent
  • Pretty fast individual query

Cons

  • LOTS of browser requests = stuttered animation
  • Incurs load time for every. single. folder.

On the good side, that keeps things predictable and stable, and is also fairly fast. However, it forces us to have load time for every single folder. And if we pre-fetch parent folders, it impacts animation because the browser is making so many requests and getting so much data.

So what do we need to do? Let’s go back to these original steps:

1. Get particular data…

When we say particular data, we’re talking about these things in particular that come from two tables. The Access Path is simply the lineage of the folder.

2. …about all the folders a user may want to see…

We landed on two SQL queries to get all the data that we needed. The first one asks for all the folders for a certain parent folder, so we can say give me the children for chapter two and we’ll get everything in this green box.

The second one just gets ALL of the descendants of the project.

The first one is faster because we have that parent ID immediately available on the folder. The second is slower just because of the way we’re storing that data, but it helps us not have to make a separate call for every single parent folder.

3. …in order of depth.

Now to explain this last one. Why are we doing this by depth?

We want to load up this data for a user in the order in which they’ll be seeing it. We have no way of knowing where the user will go when they load the page, but we do know that scrolling takes less work than clicking into every folder.

Scrolling so fast
Clicking takes slightly longer

So we’re optimizing for that fast immediate load time vertically, and then filling out the folders by level of depth horizontally as a user is clicking through them.

Aside: We’re using the access path we mentioned earlier to calculate how “deep” a folder is in its project. So for example, that “Squirrels” folder has an access path that looks like this: [“Chapter 2”, “Photos”, “Day 1”, “Outdoor”, “Squirrels”] , giving us a depth of 5 for Squirrels. An interesting thing that happened during this project is that we had to extend our pagination module to make this happen, because we didn’t have it stored as an attribute itself. To do this, we used Elixir’s dynamic macro, which could honestly be a blog post in itself.

Our Strategy

So now we’re going to wrap up with how we’re putting all that together in this network call. We’re using a three phase approach to get these folders, based on those two queries we talked about earlier.

  • First, we’re getting all of the top level folders so that a user has everything they need immediately to scroll down and so that the trunk of our tree is present for the rest of its children to build onto. We’re saying “get me all the children of this project’s root”
  • Then we’re getting all of the project folders up to a certain amount. Based on that data I shared earlier, most projects max out at 500 folders, so that’s how many we’re going to get in this next call. That will be enough for most users to seamlessly navigate in any direction
  • Lastly, for users with more than 500 folders, we’re going back to the original “basic” approach of getting folders for a parent — wherever they navigate, we’ll only load the children for that folder. Loading only what we need here keeps our front end from getting crushed with too much data.

Sounds great! But!

Problem!

Picky state management library named after the famously picky Greek God

We’re using Apollo GraphQL to cache our data on the front end, and Apollo doesn’t know how to merge separate calls — it will only store our tree properly if we’re using the SAME query every time. So how can we trick it into working for us?

Query Swapping

In fact, we ARE using the same GraphQL query, and then switching the SQL queries based on what the front end tells us it needs. If we get a parent ID, we know we’re giving back the children for the parent folder. If no id, we’ll give back all of the descendants up to 500.

Using this strategy gives us a little bonus as well. For this feature when a user refreshes a page, we want their opened folders to STAY open, but we still need to reload that data.

Our first children query gives us children of a particular parent.

When we give our query multiple parent IDS, like the ids of the open folders, it will give us back the children for just those folders so we can immediately show the child folders of the open parents without loading up the whole tree up to the opened folder immediately.

Evaluating this approach

We’re almost certainly not at the perfect solution for paginating a tree, but for the system and product constraints, we have more pros than cons at this point.

Benefits

🐇 Gets the bulk of the tree built very quickly for most use cases (99.9% projects only have <500 folders) — no loading after initial load

🌳 Doesn’t saturate Apollo cache with ALL of the data ever — but as a user explores their tree, those pathways will be stored in the browser and only need to reload on refresh

💾 We’re not relying on a backend cache, so access & folder data is always accurate — no extra work to keep a cache up to date

📄 Predictable pagination for loading and caching

Drawbacks

🏊 Users navigating deeply into a project with >500 folders will experience short lag time when expanding a parent folder. However, we imagine users with very deeply nested folders will be utilizing a search function rather than always navigating to them.

🐢 Initial load time (first time opening the page) will take longer than 600ms for large projects. We’re going to keep testing this as we move it into our production environment and include our access controls. If it’s terrible, we’re going to revisit the idea of caching this on the backend.

The End

That’s it, the new standard for paginating a folder tree! But really — this is a deceptively complex and often requested feature in applications. Our hope is that by showing the winding pathway we took with this approach, we’ve saved you some fruitless searching on pessimistic UI forums. We’d love to hear how you solved for this problem in your own projects!

--

--