ORM limitations

Published in

The Qonto Way

10 min readNov 24, 2021

Programmatic database interfacing is a tricky beast to tame. We’ve had to avoid a few pitfalls along the years.

For instance, a few weeks ago, we started to get degraded responses latency on our backend service, back-office. Its purpose is to provide the means for our Operations team to consult and perform the daily actions to solve our customers’ problems. It relies on a third-party SaaS called ForestAdmin, which provides a nice and customizable web interface. They host the front-end, so the only thing we need to make available is an API that hooks into our databases, in order to list and query necessary objects (organizations, bank account details, client information…). With this solution, we keep our clients' data’s ownership, and we don’t need to develop our own user interface.

Almost 500 employees intensively use this UI every day to satisfy our customers. That’s why this service is critical for our business.

ForestAdmin provides two different libraries to help produce that API: a Node.js package and a Ruby gem.

The package provides the API, but it does not provide the business logic nor your data storage strategy.

In our case, we chose the Ruby gem, and we kept the standard ORM that comes with Ruby on Rails: ActiveRecord.

What’s an ORM? How does it work?

An ORM (Object-Relational Mapping) is a coding technique that is really popular in many programming languages.

The main goal of an ORM is to map objects in your codebase with their counterpart in a relational database. It gives you an abstraction to query your database using a query builder and map the results to objects or collections. It also includes some tools to make your life easier:

Object-related logic like validator or serialization,
Managing relations between classes (abstract relations between tables).

The ORM often goes hand in hand with a DBAL (database abstraction layer), so you can use the same ORM functions on different relational databases without any changes in your code.

To illustrate how ORMs work, let’s have a look at a small example. Say you have those SQL queries you want to execute to fetch some rows in a cars table:

With ActiveRecord, you will define a class Car that inherits from ActiveRecord::Base class.

Behind the scene, ActiveRecord will do the introspection of your database (using your schema.rb definition) and create all the setters and getters for your Car class that map your database columns.

To do the same SQL operations as above using your Car model, you can do something like this:

An ORM also gives you abstractions to perform the CRUD (Create/Read/Update/Delete) commands in your database.

Now that we’ve defined what an ORM is, we can focus on the database we use at Qonto.

Postgres

Postgres is an open-source relational database that uses the SQL language. It’s robust and reliable. Working with billions of rows is not an issue for Postgres. It is built to scale!

But as with all tools, if you are not using them properly, it may not work as expected.

Luckily for all of us, there’s an essential feature to make sure that you’re querying your database correctly: the EXPLAIN ANALYZEcommand.

This statement runs your SQL through a profiling system, explaining the query planner decisions to execute your request.

There are two main operations running in your databases that you can focus on to keep good performance.

On the one hand, you have the tables scanning, which uses different strategies to fetch a subset of your data according to the various filters you are using in your query.

And on the other hand, you have the joining tables: in relational databases, you can access data in a table related to another through a key. To achieve this, your database engine may use different strategies. We’ll see more in detail the strategies during our investigation.

Remember, we had latencies in our back-office. Unfortunately, we can’t measure the delay on the UI, but we can assume that it’s negligible, so we’ll focus on making our backend faster.

Making this assumption, it should be in the service logic or in the queries we make. Let’s dive in!

Investigation

The first thing that we did to investigate our slow back-office issues was to create a staging environment to do our tests.

To be as close as possible to our production environment, we spawned a dedicated database using a snapshot of our production database.

For the UI, ForestAdmin has a feature to create as many frontend environments as we want.

We just created a new one with the production settings and we plugged it into our previously created back-office.

So now we were ready to go!

After we set up our test environment, we started to check in our monitoring tool for the slowest endpoints on production. We used our NewRelic integration to list those endpoints, and we checked the code behind each. Some endpoints wait for request parameters sent by ForestAdmin UI, so we add the Forest View Settings to our checklist.

Unfortunately, most of the endpoints were generics /index routes used by many different views in the UI: hard to tell which views with which settings were the slowest.

At this point, the only thing that we had was a list of routes with high latency. All of those routes had something in common. Most of the time was spent making requests to Postgres.

Desperate situations call for drastic remedies. As a last resort, we decided to switch the rails log level on our test environment to debug

# config/environments/staging.rb
-  config.log_level = :info
+  config.log_level = :debug

and navigate on our test ForestAdmin panel to see the SQL queries made by our back-office application for each view.

Not ideal, but it did the trick! With our logs, we could identify which view in the UI was generating which SQL query 🎉

Here is an example of what we found :

To give some context, the bold subquery in this example is not generated by our backend application. For each view in ForestAdmin, you can provide a query in the UI to only display a subset of your resources (in this case memberships with some extra field from other joined tables). This subquery is configured in ForestAdmin, not written by developers maintaining the backend application, and then the ForestAdmin gem injects this subquery using the where method of ActiveRecord.

If we go back to the query performance, this one takes around 5 seconds to be executed, which is not great (especially since we run it several times per hour) 😕.

Let’s run an EXPLAIN ANALYZE to see why Postgres is struggling to execute this query.

Running just an EXPLAIN shows a breakdown of the query execution plan most likely to happen. All steps are nested, and for each, you'll see:

a relative cost
an estimated number of rows output
an estimated width average

The ANALYZE option forces the engine to execute the query, allowing the results to include more information about the real execution time of the statement.

For a more understandable display of the stats you can use this website https://explain.depesz.com/: just copy and paste your EXPLAIN ANALYZE output in it, and this is what you get:

Summary of our EXPLAIN ANALYSE query results

As you can see, Postgres spent more than 2 seconds on Hash Left Join and more than 1 second doing seq scans.

We discussed these two steps earlier. Let’s go deeper in now.

How does PostgreSQL scan tables?

Three different strategies can be used to scan a table:

sequential scans (seq scan): Postgres need to check each row, one by one, to scan the table
index scans: Postgres uses an index to perform the scan. As indexes are trees, the search is more efficient than a sequential scan.
bitmap index scans: it is a middle ground between a sequential scan and an index scan. Postgres uses an index to fetch a subset of your table and then perform an index scan on this subset.

On large tables, seq scans lead to bad performances, but they can easily be improved by adding indexes on the scanned fields. Keep in mind that this comes at the cost of decreased write performance, though, so you shouldn't add more indexes than necessary.

In our case, we are not adding indexes here as the performance is good enough on those scans and we don’t want to degrade the write performance on our tables.

How is PostgreSQL handling joins between tables?

There are three different join strategies:

Nested Loop: This is the most sub-optimized join. For each of the rows in the first table, it will look up the matching rows in the foreign table.
Merge Join: Each relation is sorted on the join attributes before the join starts, then Postgres scans the two relations in parallel, and matching rows are combined to form joined rows.
Hash (Left/Right) Join: Postgres fetches rows from the first table and stores them in a hash in memory using the join key as hash keys. Then it fetches rows from the second table and checks if there are present in the hash.

You want to avoid nested loop joins on huge tables as much as possible, as they will have a considerable cost. In our query, Postgres is using the Hash Left Join, which is good news, so let's see why Postgres is struggling to perform those joins.

If we go back to our query, what’s happening here?

We see that Postgres is joining all the tables memberships, organizations, and addresses before filtering on the membership ID.

Those tables are pretty big, so it builds 623380 rows with all the columns of those tables, and then it applies the filter to return only 1 row. It would be more efficient to filter on the id first and then perform the joins.

The question here is: is it possible to filter on ids first and then perform the joins to get something more optimized?

The answer is… Yes! Ideally, we would like to force Postgres to filter on this subquery first and then run the joins. In SQL, there’s something called CTE (Common Table Expression) that can do the trick.

A CTE defines temporary tables that exist just for your query, and you can use this table as you would with any other real table.

Let’s rewrite our previous query but this time using a CTE instead of a subquery:

And let’s rerun an EXPLAIN ANALYZE:

As you can see, no more hash left join taking ages, and the query is now taking ~500ms to be executed.

So it’s possible to make this query 10 times faster with CTE. How do we do that? The issue here is that ActiveRecord is not natively supporting CTE in his query builder 😕

If you want to do this using Ruby, you have different solutions :

Directly write your raw query using Arel, which is the SQL AST manager used by ActiveRecord
Use this activerecord-cte gem which adds CTE support to ActiveRecord
Manually execute the first subquery and inject the result in the main query

In our specific case, there’s a built-in solution in ForestAdmin, the Smart Segments!

If you remember well, the subquery is defined in ForestAdmin UI.

With smart segments, you can move that definition back into the backend side, behind a dedicated endpoint, and ForestAdmin will use the endpoint response (i.e., a list of ids or primary keys) and inject in the main query.

Basically, it’s doing the same thing as a CTE but on our application side.

ORMs takeaways

After seeing the last example, you might wonder: why does the ORM choose not to switch to a CTE? Couldn’t it even split the query execution if it needed to?

The response is simple: ORMs (actually the query builder here) are not smart enough. They are adding an abstraction layer to query your data easily, but behind the scenes, they are just crafting simple SQL queries.

You need to keep in mind that ORMs are not magic nor smart, they don’t make any optimization for you, and they have no idea how the query planner will handle the query.

Does it mean that you should never use an ORM?

Of course, not. There are pros to using ORM! They are really easy to use and make your code easily readable even by someone who doesn’t know SQL. So if you are working with simple SQL queries, you can use an ORM with no problem.

For more complex queries such as the joins and the subquery that we have seen previously, as the ORMs cannot anticipate how the query planner is going to behave, you may check the query builder output and use the EXPLAIN ANALYZE before deploying those queries on production.

Conclusion

The query planner behavior is sometimes challenging to anticipate, and ORMs can trip up performance issues. Thankfully, EXPLAIN ANALYZE is your friend.

You should always test the queries built by your query builder to be sure they will not impact your application performance.

As a best practice, you can even further monitor your slow queries to spot them before they degrade your application too much.

About Qonto

Qonto is a finance solution designed for SMEs and freelancers founded in 2016 by Steve Anavi and Alexandre Prot. Since our launch in July 2017, Qonto has made business financing easy for more than 200,000 companies.

Business owners save time thanks to Qonto’s streamlined account set-up, an intuitive day-to-day user experience with unlimited transaction history, accounting exports, and a practical expense management feature.

They have more control, whilst being able to give their teams autonomy via real-time notifications and a user-rights management system.

They have improved visibility on cash-flows through tools such as smart dashboards, transaction auto-tagging, and cash-flow monitoring.

They also enjoy stellar customer support at a fair and transparent price.

Interested in joining a challenging and game-changing company? Consult our job offers!