Resolving Riddling n+1 Requests

Published in

First Resonance Engineering

6 min readMay 25, 2020

This blog post comes to you live from First Resonance HQ, delivering hardware with confidence. Here at First Resonance we make a product known as ion, it is a factory operating system that allows engineers and technicians to design Procedures, and allow technicians to execute said procedures within a Run. ion provides traceability, planning and version control along with other features.To learn more about ion, visit our website. Our data model has many relationships: Procedures have steps. Steps have content, relationships to other steps, attachments, fields. Fields have types, and values, and it never ends. This could result in dozens of queries to the database, keeping the user waiting for their data for seconds. In this post, I talk about different strategies to improve the network efficiency and timeliness of the loads. Below is an example of a Procedure.

We use SQLAlchemy as our ORM and, along with most ORMs, it has an n + 1 query problem, issuing an individual query for each child object in a one-to-many relationship. This is especially problematic for a GraphQL API as it exposes the nested structure of data. An example within our application ion, is the structure of Procedures. Procedures have a set of steps which define execution instructions and those steps have a set of fields for data capture. Querying out this nested structure with n + 1 queries starts the network on a Tour de France to gather all the data needed to fulfill a user’s request, and if the network is not properly doping it can cause significant slowness. But fret not my friend, for this problem has a solution. Before I delve deeper into the problem or the solution I’d like to thank this post, for its excellent description of the problem and code to help solve it. I borrowed portions of the code and ideology from that post and would like to give the proper credit.

Let’s start with an example of a procedure that has four steps and seven step fields. A simple query such as the one below will make twelve round trips to the database. One for the Procedure, four for the steps, and seven for the step fields.

Loading Strategies

I know what you are thinking, “Wow that sure is a lot of queries; there must be something we can do about that.” Well boy do I have a surprise for you — it’s a little secret I like to call loading strategies (or just loading strats for the initiated). All the SQLAlchemy loaders and their most common use cases are given in the following documentation. If we wanted to reduce everything to a single query we could use the joined load strat; however, a significant amount of outer left joins can be costly to a database. And we cannot perform an inner joined load because we want to query procedures without steps and steps without fields. Given how poorly outer left joins scale when a database grows, we should avoid this method for posterity’s sake: we don’t want our children waiting 2 minutes for a single query to return.

Outer join query

select procedures.*, steps.*, fields.*
from procedures
left outer join steps on procedures.id = steps.procedure_id
left outer join fields on steps.id = fields.step_id
where procedures.id = 1;

But do not shed tears, for there is another viable option when it comes to querying children objects of many to one relationships., the most viable being select-in load. Now, unlike joinedload, this option does not lower the amount of queries from twelve to one, but rather twelve to three. It submits three queries one for each data type using all the parent object IDs in an in query.

Select in query:

select * from procedures where id = 1;
select * from steps where id in (1, 2, 3, 4);
select * from fields where id in (1, 2, 3, 4, 5, 6, 7);

Now that we’ve chosen a loading strategy, we will move into how we use this strategy.

Eager Vs Lazy Loading

Now that you are abreast on the possible loading strategies, it is time to discuss when these queries are made. Luckily the SQLAlchemy ORM is like a swiss army knife in that it offers many flexible options for loading data. But also like a swiss army knife most of the time you are just using a single tool, and you are very confused on why a pocket knife has a nail file. First, we will discuss lazy loading, a strat which can be defined on the ORM level, by adding the lazy parameter to defined relationships.

ORM models with lazy loading:

class Procedures(Base):
    id = Column(Integer, primary_key=True, autoincrement=True)
    steps = relationship(‘Steps’, back_populates=’procedure’,
                         lazy=’selectin’)class Steps(Base):
    id = Column(Integer, primary_key=True, autoincrement=True)
    fields = relationship(‘Fields’, back_populates=’step’,
                         lazy=’selectin’)
    procedure_id = Column(Integer, ForeignKey(‘procedures.id’))
    procedure = relationship(‘Procedures’, back_populates=’steps’)class Fields(Base):
    id = Column(Integer, primary_key=True, autoincrement=True)
    step_id = Column(Integer, ForeignKey(‘steps.id’))
    steps = relationship(‘Steps’, back_populates=’fields’)

This implementation issues the query the first time that object is accessed within the ORM. This strat works great for the first depth, for instance this only submits one query as opposed to 4 when accessing a Procedure’s steps. However, this strat does not perform well when it comes to deeper relationships as it will submit four queries to get fields, one for each step, as opposed to a single query to get fields.

Eager loaded relationships:

query_result = session.query(Procedures)\
 .options(selectinload(‘steps’).selectinload(‘fields’))\
 .filter(Procedures.id == 1)

Eager loading on the other hand can be overly greedy. We would not want to eager load every procedures’ steps and fields if we are just trying to list Procedures and their titles. In order to create a balance between being overly greedy or overly slovenly, we will need to build something on the GraphQL API layer that can create a balance between the two.

I now proudly present the ion data loader.

The Ion Data Loader

The ion data loader code base started from the post mentioned above. The essential idea is that we can coalesce the requests by examining the GraphQL query and eager load only those relationships which are requested. I shall now explain how this was accomplished. If you have any questions along the way feel free to raise your hand and ask. The ion data loader is implemented by creating a new EagerSAObjectType class which inherits the graphene SQLAlchemy class SQLAlchemyObjectType and overrides its get_query method. The new get_query method, by default eager loads every requested relationship with the select-in join method. If a separate loading strategy is more applicable the EagerSAObjectType exposes an additional attribute which can be set in the model’s meta class. This additional property, known as load_strategies, allows a user to inner join relationships if that is more effective. The load_strategies property also allows a user to define joins across multiple tables. In order to not make the class seem to push or overly forthcoming no other additional attributes are exposed. The load_strategies parameter comes in handy while specifying the join paths for ion’s entity relationships, such as for file attachments. To get a step’s file attachments, we must first join on the step with the entity table than the entity table with the file attachments table. While this middle join can be hidden on the model using an associative property, in order to effectively join the tables we must specify the entire path.

Specifying complex join path:

class Step(EagerSAObjectType):
  class Meta:
    model = Steps
  load_strategies = {
    ‘assets’: {
      ‘join_path’: (‘entity’, ‘file_attachments’),
      ‘load_type’: selectinload
     }
   }

Conclusion

The ion data loader is generic enough to be applied to any project using graphene sqlalchemy. You can see the entire EagerSAObjectType class here . It effectively more than halved the response time of our ion API, and most notably had a huge effect on large queries with many subselects in the query, such as that up top which selects Procedures, steps and fields.

That, ladies and gentlemen, brings this blog entry to an end. I hope you enjoyed the wild ride.

The full code from this post can be found here.

Resolving Riddling n+1 Requests

Loading Strategies

Eager Vs Lazy Loading

The Ion Data Loader

Conclusion

Written by Connor