Keeping your APIs lean and Queries Fast — Part I
This is the very first instalment of my learnings at shipmnts while developing Rest full APIs for various micro services we have . As we are developing a cross platform application which is data intensive the APIs we develop are supposed to be lean, to the point and the amount of queries we do are supposed to be less, highly optimised and yes off-course they are supposed to be fast.
In this article I will try to share my learning around two questions which all Ruby On Rails developers face some or the other time in their development lifecycle.
How to combat N+1 queries?
Let’s address the elephant in the room — N+1 queries are the biggest nightmare of any developers life. They eat up the processing power of your servers and they make the overall experience of your user slow. N+1 queries problem arise when you have 1 to many relationships, many to many relationship etc. and you want to preload all the related records when you select records which refer to the related records.
To combat N+1 queries problem rails comes loaded with three very powerful functions includes
(references a new weapon for ROR warriors), eager_load
and preload. Includes is really powerful and its an additional level of abstraction on eager_load
and preload
. If you delve deeper into the implementation of includes function you can easily decipher that it internally delegates to eager_load or preload (see it here) to do the work depending on the type of query you have written. Lets first define what each of these functions do conceptually.
#eager_load
function executes a big LEFT OUTER JOIN
SQL query for each table which is being eager loaded and loads all attributes of the preloaded tables as dynamically named columns into the memory.
#preload
function executes distinctSELECT
queries for each related table to preload data of all related tables.
#includes
as I already mention it delegates the job to #preload
or #eager_load
depending on the presence or absence of WHERE
clause condition on one or more of the preloaded tables.
Let’s take two API use cases and try to see them in action with live data
Endpoint should return all activities where Joel is participating user and preload his company details and all users of his company
Its very clear from the use case that I will either combine #joins
and #includes
functions or I will combine #includes
and #references
functions to get the resultant data. The only difference between the two is first one will use INNER JOIN
and the later one will use LEFT OUTER JOIN
. Now for experimentation purpose I am also doing the same thing using #eager_load
the only difference being I am not using references
with it.
If you see the results you can easily decipher that includes called #eager_load
automatically when it saw the conditional statement on the preloaded table. But we had to use references additionally to tell #includes
to call #eager_load
internally and perform LEFT OUTER JOIN
on which preloaded table. Whereas we only passed the arguments once to eager_load
function.
Endpoint should return all activities where Joe activities where Joel is participating user and preload all of the companies and users participating over those activities
As you can clearly see the above the queries only returned count as 1 when we did this activities.first.collaborating_companies.count
that means includes will not be able to do what we want in this case as it will always call eager_load
in all cases whenever it sees a conditional statement on preloaded tables and we will only get one company and its users. So let’s see preload
whether it can do what we want or not
As you can easily see two separate SELECT
queries were executed to load all participating companies and then all participating users of those companies. Thus fulfilling our use case.
Lets try to address the second question and then we will conclude with key takeaway points
JOIN Vs Includes — When To use what?
As you may clearly observed joins
and includes
with references
did the same thing with an exception of the type of join they performed. So how we should when we should use what. So heres my take on that. If you just want to return parent table information basis on a WHERE
clause on the related table attribute you will perform use join as it doesn’t implicitly execute a big query to preload all the attributes of the related table. So no additional data getting loaded . If you want to return related table information too then you should use includes
in combination with references
Key Takeaways
If you love abstraction more use
includes
but beware it may give you errors if not used with precautions (try to run the same query we did above without references see what happens?) but if you know your API use cases clearly use#preload & #eager_load
as it will help you control the amount of data to be transmitted in the API and control over the SQL queries getting executed
eager_load
will only preload the related table entries which satisfy the conditional clause. This will definitely bring back the example we discussedwhere("particpating_users.user_id = '1' ")
If you want your server log queries to be readable for debugging purposes go for
preload
in combination withjoins
as it ends up making distinct readable queries instead of doing one gigantic query which we saw in case when we usedeager_load
and when we usedjoins
in combination withincludes
In the next part I will be sharing more on Initial Sync APIs, Delta APIs and how to design serializers to serve customised data to different API consumers?