Make your query better
SQL Performance tips and tricks
SQL stands for Structured Query Language. This query language lets you access and manipulates databases.
We will execute different queries to get the desired data.
Every app or project even a small one store its data in some database. And the database easily can be the performance bottleneck of your app.
In this article, I will walk through several quick and easy ways to make your queries faster.
In the article, I assume you are using PostgreSQL DB but all these tips can perform well in any relational DB and are suitable with the SQL and DB standards.
The tips we will see in this article are:
- Check yourself with explain.
- SQL clause order of execution.
- Table partitions.
- Indexing your data.
- Window function.
- Special data types.
Always check your queries
After writing your query or even on an old query with performance issues before starting fixing the SQL and tables structure you would like to know how exactly the DB runs this query step by step.
The EXPLAIN statement is supposed to do exactly that. The EXPLAIN statement returns the execution plan which the PostgreSQL planner generates for a given statement.
This statement shows how the table is scanned, by index scan or sequential scan and which tables are used, what kind of join algorithm will be used, and more.
Some useful data that the EXPLAIN statement return is the start cost (before returning the result), the total cost of your query, the estimated number of rows that scan each stage in the execution plan, and the size these rows took to load.
EXPLAIN SELECT * FROM film;
EXPLAIN SELECT * FROM film WHERE film_id = 100;
SQL order of execution for better performance
SQL has a lot of clauses and each one perform a different function such as:
The clause function executes in a specific order and executes on the data collected till this moment.
For better performance, we would like to perform each clause function on as least rows as possible.
For instance, we would like to order the data after we filter it, for ordering fewer data rows.
FROM and JOIN perform first as they collect the whole set of data we working on within the query. Then the WHERE clause is supposed to filter the data, to leave us only with the relevant data. And only then do we SELECT and ORDER the data we want to return.
Partition your data
Your tables can store a lot of data and be very large, but often on a large table, you need only a specific part of this data.
Let's assume you have a GPS tracking table and you want to query only the GPS track from your city every time but still store all the other data as well. In this scenario, your query will load all the tables and then filter the data, but if the filter is regular use and the table is very large why should we load the whole table each time, for this scenario we can use partitions.
If we split the table to partition, one for our city and one for the other ones, from now on each time we run our query it will load only the specific partition with much fewer data in it.
With partitioning the DB split the one large table into smaller physical pieces.
Partitioning provides several benefits:
- Query performance can be improved dramatically in certain situations and reducing index size
- When queries or updates often (large percentage uses) use a single partition, performance can be improved by taking advantage of a sequential scan of that partition instead of accessing the whole table.
- Bulk loads and deletes can be accomplished by adding or removing partitions, faster than a bulk operation. These commands (adding partition and removing one) also entirely avoid the VACUUM overhead caused by a bulk DELETE.
- Seldom-used data can be migrated to cheaper and slower storage media (as s3).
Indexing your data
Indexes are a common way to enhance database performance. An index allows the DB server to find and retrieve specific rows faster. But indexes also add overhead to the database systems as a whole, they should be used sensibly.
EXPLAIN SELECT * FROM film WHERE film_id = 100;
Let’s take a look at this query, if you run this query without any index the DB would scan the entire film table, row by row, to find the all matching entries.
And if the film table has many rows it could be inefficient.
But if you index the table by the film_id column the DB might find the entries in a few levels in the search tree only.
How it works
An index is a copy of selected column data designed to enable efficient search. Index normally includes a link to the original row of data which was copied, to retrieve the row efficiently.
The regular search, without an index, is a linear search that in the average case will take O (n), an index is a data structure that improves the performance of lookup. There are many different data structures to use for this purpose, they are different in many ways and may have different tradeoffs such as lookup performance, index size, and index update performance. Many of these data structures are designed to O (log (n)) lookup efficiency and in some scenarios, some of them possibly achieve O (1) performance.
Window function or analytic function is a function that uses the values from one or multiple rows to return a value for each row.
On the other hand aggregation function, such as group by and sum, return a single value for multiple rows.
When to use it
When you need to perform some aggregation over the data but still have to scan the whole rows for more analyzing or displaying.
If you will try this kind of operation with a regular aggregation function you will have to use subqueries to scan the data over again, which will harm the query performance and will make it much complicated.
Let’s take a look at a simple window function that helps us compare employee’s salaries to the average salary of their department
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
The OVER keyword reveals to us that this is a window function.
The results of this query will be:
Window functions are evaluated after aggregation.
Special Data Types
In this section, we will take a look at some data types that if you will treat them right you credit for better performance and better functionality.
When it becomes to geo data all you need to do is to store a collection of points, you may even store it as text or JSON that contain this data but in this way, it will harm your performance and you will lose a lot of functionality.
Every modern DB has a special geo type for this scenario that can index the points data for better performance and let you use geo libs for complicated functionality such as union, calculate area, etc.
In PostgreSQL, you will use point/line/polygon data type and may add a PostGIS extension.
Storing JSON in the DB can be helpful or necessary sometimes. Many times we, as developers, relate to JSON data as string/text but when it comes to storing it in the DB this is not the best way.
Every modern DB has a special data type for JSON that support indexing and compression of the JSON, they even support querying nested object inside the JSON.
In PostgreSQL, for example, you can store JSON data as text data type but you won’t be able to query a nested property in this JSON, you can store it as JSON data type but it will not compress and indexed and that may harm your performance and full your storage, the best way to store JSON data in PostgreSQL is in JSONB data type that contains all the features, indexing, compression, query nest properties.
In a nutshell
The DB and the SQL queries are part of the main business of any modern app, and these days any app must have excellent performance. In the article, we take a look at some ways to make your queries faster.
The most important part is to know your queries and DB, unknown queries and misconfiguration may cause performance issues.
Hope you enjoy reading the article and it will help you write better code and make yours better.