ERP Tech Talk #1: Optimize Your SQL Query
Last week, we finally held our very first ERP Tech Talk. How cute is that? One person shares a topic with the team, another person who’s been chosen randomly should post a summary. The topic can be anything related to ERP. As much as it felt quite heart-racing for me, guessed that I was fortunate enough to be the one to summarize it for the first time.
The session was presented by Evika Rachma Yuniasari, she had been part of IT Paragon’s DAI (Data and Applied Intelligence) prior to joining ERP team. And thus, she could see that we might practice querying in not so very good state while in fact we do use it a lot inside ERP e.g. script makings, adjusting Odoo NBM databases, developments, and other works on a daily basis. So, regarding her data analyst experience, she shared these tips with us on how to optimize SQL query, started from something we’re all most familiar with: basic clauses it is!
Why Should We Optimize It?
There are two relational database management systems that we often use in IT Paragon; MySQL (employee database, ProcessMaker, BrandCommerce) and PostgreSQL (Odoo NBM). For how important and crucial these databases are, we surely can’t let bad-written query slows the entire system down. Here are some other reasons why we should optimize it:
- Improves the speed of data retrieval
- Reduces the amount of wear on the hardware
- Allows the system to service more queries in the same amount of time
- Server could run more efficiently
Basic Clauses and How to Write Them Efficiently
- SELECT
This clause lets you choose the fields that you want to display on your chart and the information you want to pull from database.
Beware of using SELECT *
SELECT * (or select all) is used to obtain all the data from a table. If the data has tons of fields and rows, it will tax the database resources and slow the execution time of the query. It is much better to use SELECT along with the specific fields required.
Example: This query below displays all the data in the table res_partner when only the id and name was required.
Fix it like this: Instead of “select all”, specify the statement to obtain the desired result.
This allows you to query only the fields that contain the needed information. It should speed things up immediately.
Avoid SELECT DISTINCT for large tables
This is used to obtain distinct results from a query by eliminating the duplicates. As all the duplicate fields are grouped together, it will increase the run time of the query.
As an alternative, GROUP BY can be used to get the same results.
2. WHERE
The WHERE clause allows you to filter your query to be more specific according to the condition mentioned in the statement.
Don’t use WHERE for creating joins
When using WHERE for creating joins, the number of rows obtained is the product of the number of rows of the two tables. This is not good as more database resources are required.
Use INNER JOIN instead, as it only combines the rows from both tables which satisfy the required condition.
Use WHERE instead of HAVING to filter without aggregate functions
HAVING doesn’t allow the usage of indexes which slows the execution time of the query.
So it is better to use WHERE whenever possible.
Avoid OR, AND, NOT operations if possible
In the case of large databases, it is better to find replacements for those operations to speed up the execution time of the query.
The example below gives exactly same results.
Avoid wildcard prefixes “%” in LIKE searches
They should not be used at the beginning of the pattern as a full table scan is required to match the pattern which consumes more database resources.
So, avoid the wildcard characters at the beginning of the pattern and only use them at the end if possible.
3. ORDER BY
This clause allows you to sort the database according to the fields you have defined in the SELECT statement.
Avoid using column number for sorting
Using number might be easier, but at the same time it’s confusing. It could also affect the result if there’s any change on the database order.
Better specify the field desired.
Avoid sorting with a mixed order
In this example, you could just remove the “ai.date ASC” part since the default sorting is ASC (ascending). No need to mixed them up.
4. LIMIT
The LIMIT clause is used to limit the number of results you get or if you want to only displays the number of records specified. It is better to define the limit especially if you work on a large database.
May your SQL wisdom be levelling up now. We’re so proud you read the whole article, congratulations! See you again on the next ERP Tech Talk.
References:
- IT Paragon’s ERP Tech Talk #1 by Evika Rachma Yuniasari
- https://www.geeksforgeeks.org/what-are-the-best-ways-to-write-a-sql-query/