Keeping your APIs lean and Queries Fast — Part I
This is the very first instalment of my learnings at shipmnts while developing Rest full APIs for various micro services we have . As we are developing a cross platform application which is data intensive the APIs we develop are supposed to be lean, to the point and the amount of queries we do are supposed to be less, highly optimised and yes off-course they are supposed to be fast.
In this article I will try to share my learning around two questions which all Ruby On Rails developers face some or the other time in their development lifecycle.
How to combat N+1 queries?
Let’s address the elephant in the room — N+1 queries are the biggest nightmare of any developers life. They eat up the processing power of your servers and they make the overall experience of your user slow. N+1 queries problem arise when you have 1 to many relationships, many to many relationship etc. and you want to preload all the related records when you select records which refer to the related records.
To combat N+1 queries problem rails comes loaded with three very powerful functions
includes (references a new weapon for ROR warriors),
eager_load and preload. Includes is really powerful and its an additional level of abstraction on
preload. If you delve deeper into the implementation of includes function you can easily decipher that it internally delegates to eager_load or preload (see it here) to do the work depending on the type of query you have written. Lets first define what each of these functions do conceptually.
#eager_load function executes a big
LEFT OUTER JOIN SQL query for each table which is being eager loaded and loads all attributes of the preloaded tables as dynamically named columns into the memory.
#preload function executes distinct
SELECT queries for each related table to preload data of all related tables.
#includes as I already mention it delegates the job to
#eager_load depending on the presence or absence of
WHERE clause condition on one or more of the preloaded tables.
Let’s take two API use cases and try to see them in action with live data
Endpoint should return all activities where Joel is participating user and preload his company details and all users of his company
Its very clear from the use case that I will either combine
#includes functions or I will combine
#references functions to get the resultant data. The only difference between the two is first one will use
INNER JOIN and the later one will use
LEFT OUTER JOIN . Now for experimentation purpose I am also doing the same thing using
#eager_load the only difference being I am not using
references with it.
If you see the results you can easily decipher that includes called
#eager_load automatically when it saw the conditional statement on the preloaded table. But we had to use references additionally to tell
#includes to call
#eager_load internally and perform
LEFT OUTER JOIN on which preloaded table. Whereas we only passed the arguments once to
Endpoint should return all activities where Joe activities where Joel is participating user and preload all of the companies and users participating over those activities
As you can clearly see the above the queries only returned count as 1 when we did this
activities.first.collaborating_companies.count that means includes will not be able to do what we want in this case as it will always call
eager_load in all cases whenever it sees a conditional statement on preloaded tables and we will only get one company and its users. So let’s see
preload whether it can do what we want or not
As you can easily see two separate
SELECT queries were executed to load all participating companies and then all participating users of those companies. Thus fulfilling our use case.
Lets try to address the second question and then we will conclude with key takeaway points
JOIN Vs Includes — When To use what?
As you may clearly observed
references did the same thing with an exception of the type of join they performed. So how we should when we should use what. So heres my take on that. If you just want to return parent table information basis on a
WHERE clause on the related table attribute you will perform use join as it doesn’t implicitly execute a big query to preload all the attributes of the related table. So no additional data getting loaded . If you want to return related table information too then you should use
includes in combination with
If you love abstraction more use
includesbut beware it may give you errors if not used with precautions (try to run the same query we did above without references see what happens?) but if you know your API use cases clearly use
#preload & #eager_loadas it will help you control the amount of data to be transmitted in the API and control over the SQL queries getting executed
eager_loadwill only preload the related table entries which satisfy the conditional clause. This will definitely bring back the example we discussed
where("particpating_users.user_id = '1' ")
If you want your server log queries to be readable for debugging purposes go for
preloadin combination with
joinsas it ends up making distinct readable queries instead of doing one gigantic query which we saw in case when we used
eager_loadand when we used
joinsin combination with
In the next part I will be sharing more on Initial Sync APIs, Delta APIs and how to design serializers to serve customised data to different API consumers?