Keeping your APIs lean and Queries Fast — Part I

This is the very first instalment of my learnings at shipmnts while developing Rest full APIs for various micro services we have . As we are developing a cross platform application which is data intensive the APIs we develop are supposed to be lean, to the point and the amount of queries we do are supposed to be less, highly optimised and yes off-course they are supposed to be fast.

In this article I will try to share my learning around two questions which all Ruby On Rails developers face some or the other time in their development lifecycle.

How to combat N+1 queries?

Let’s address the elephant in the room — N+1 queries are the biggest nightmare of any developers life. They eat up the processing power of your servers and they make the overall experience of your user slow. N+1 queries problem arise when you have 1 to many relationships, many to many relationship etc. and you want to preload all the related records when you select records which refer to the related records.

Example we will be referring in the article

To combat N+1 queries problem rails comes loaded with three very powerful functions includes (references a new weapon for ROR warriors), eager_load and preload. Includes is really powerful and its an additional level of abstraction on eager_load and preload. If you delve deeper into the implementation of includes function you can easily decipher that it internally delegates to eager_load or preload (see it here) to do the work depending on the type of query you have written. Lets first define what each of these functions do conceptually.

#eager_load function executes a big LEFT OUTER JOIN SQL query for each table which is being eager loaded and loads all attributes of the preloaded tables as dynamically named columns into the memory.

#preload function executes distinctSELECT queries for each related table to preload data of all related tables.

#includes as I already mention it delegates the job to #preload or #eager_load depending on the presence or absence of WHERE clause condition on one or more of the preloaded tables.

Let’s take two API use cases and try to see them in action with live data

Endpoint should return all activities where Joel is participating user and preload his company details and all users of his company

Its very clear from the use case that I will either combine #joins and #includes functions or I will combine #includes and #references functions to get the resultant data. The only difference between the two is first one will use INNER JOIN and the later one will use LEFT OUTER JOIN . Now for experimentation purpose I am also doing the same thing using #eager_load the only difference being I am not using references with it.

If you see the results you can easily decipher that includes called #eager_load automatically when it saw the conditional statement on the preloaded table. But we had to use references additionally to tell #includes to call #eager_load internally and perform LEFT OUTER JOIN on which preloaded table. Whereas we only passed the arguments once to eager_load function.

Endpoint should return all activities where Joe activities where Joel is participating user and preload all of the companies and users participating over those activities

As you can clearly see the above the queries only returned count as 1 when we did this activities.first.collaborating_companies.count that means includes will not be able to do what we want in this case as it will always call eager_load in all cases whenever it sees a conditional statement on preloaded tables and we will only get one company and its users. So let’s see preload whether it can do what we want or not

As you can easily see two separate SELECT queries were executed to load all participating companies and then all participating users of those companies. Thus fulfilling our use case.

Lets try to address the second question and then we will conclude with key takeaway points

JOIN Vs Includes — When To use what?

As you may clearly observed joins and includes with references did the same thing with an exception of the type of join they performed. So how we should when we should use what. So heres my take on that. If you just want to return parent table information basis on a WHERE clause on the related table attribute you will perform use join as it doesn’t implicitly execute a big query to preload all the attributes of the related table. So no additional data getting loaded . If you want to return related table information too then you should use includes in combination with references

Key Takeaways

If you love abstraction more use includes but beware it may give you errors if not used with precautions (try to run the same query we did above without references see what happens?) but if you know your API use cases clearly use #preload & #eager_load as it will help you control the amount of data to be transmitted in the API and control over the SQL queries getting executed
eager_load will only preload the related table entries which satisfy the conditional clause. This will definitely bring back the example we discussed where("particpating_users.user_id = '1' ")
If you want your server log queries to be readable for debugging purposes go for preload in combination with joins as it ends up making distinct readable queries instead of doing one gigantic query which we saw in case when we used eager_load and when we used joins in combination with includes

In the next part I will be sharing more on Initial Sync APIs, Delta APIs and how to design serializers to serve customised data to different API consumers?