Optimizing Drizzle ORM for performance and row reads

Angelelz
Drizzle Stories
Published in
3 min readJan 7, 2024

This will get technical, just a little bit

Row reads. Before and after.

Everything started with a question in the Drizzle discord server. There was a user asking for help troubleshooting his project using drizzle to connect to a PlanetScale database, where an important query was causing more than 20.000 row reads and returning a handful of results. I’ve been helping at the discord server as a drizzle advocate, so I saw the post and became interested because why not.

I offered some high level solutions like creating indexes in the order that was being used in the query. Here’s the link to that one if you’re interested in checking it.

Unfortunately, that wasn’t enough, so my best recommendation was to just reach out to the PlanetScale support, maybe they can check more technical details of the query and the way the planner is running it.

Unsurprisingly, a couple days later, he reported his findings in discord and I asked him to open an issue in GH. When I finally found time to take a look, I found that all the work of investigating was already done, and drizzle just needed to make a little adjustment to the way it created the query for the related tables in the relational query builder.

It was actually a change in four lines. But now some of the test were failing. It was not going to be that easy after all. I deep dived into the problem, ran a couple of manual tests including some explain analyze for some queries. I found that the problem was that the window function used to order the aggregated results was necessary, but only if no limit was being defined.

Let me explain.

The relational query builder uses aggregation of the related tables via json functions. In the case of MySql, the star of the show is json_arrayagg(). For example, for the following query:

db.query.users.findMany({
with: { posts: true },
});

Drizzle ORM, in planetscale mode, will generate the following query:

select `id`, `name`, `created_at`, coalesce((
select json_arrayagg( -- <== Notice this guy right here
json_array(`id`, `content`, `author_id`, `created_at`)
)
from `posts` `users_posts`
where `users_posts`.`author_id` = `users`.`id`),
json_array()) as `posts`
from `users`;

But if you try to order the posts by anything, for example, createdAt, this will be the query generated:

select `id`, `name`, `created_at`, coalesce((
select json_arrayagg(
json_array(`id`, `content`, `author_id`, `created_at`)
)
from (
select *, row_number() over (order by `users_posts`.`created_at` desc)
from `posts` `users_posts`
where `users_posts`.`author_id` = `users`.`id`
) `users_posts`), json_array()
) as `posts`
from `users`;

Notice that inside the subquery there is now another subquery with a window function row_number. This is necessary because Mysql doesn’t care about order inside the json_arrayagg function. The problem is that this is now a lot less performant. In the case of Augusto, the user that found the issue in the first place, the query planner was just ignoring all the indexes and just doing full table scans.

But wait a minute, he was able to order the subqueries without the window function. What gives? Well, I continued investigating and turns out that Mysql will take the order into account if you also include a limit. Isn’t it obvious?

That means that the window function is only necessary if you order by but don’t limit. I implemented this logic into the relational query builder and tested. BOOM, all tests passing. I implemented a quick test to verify this new behavior and opened the PR.

As soon as that PR is reviewed and merged, I’ll start recommending users to include a limit to their related tables when ordering them while using the Drizzle’s RQB. This is the resulting query in my PR if you also include a limit to the posts:

select `id`, `name`, `created_at`, coalesce((
select json_arrayagg(
json_array(`id`, `content`, `author_id`, `created_at`)
)
from (
select *
from `posts` `users_posts`
where `users_posts`.`author_id` = `users`.`id`
order by `users_posts`.`created_at` desc
limit 16
) `users_posts`), json_array()
) as `posts`
from `users`;

Beautiful!

And the results speak for themselves. See the row reads below:

Row reads. Before and after. The last 3 are actual data point, so small that are barely visible.

That’s it. A pretty simple fix. Just some investigation was necessary.

--

--

Angelelz
Drizzle Stories

Full stack JavaScript/Typescript freelancer 🚀. Drizzle-orm advocate.