10 best practices to improve your SQL Queries

Ashok Choubey
7 min readMar 26, 2023

--

SQL (Structured Query Language) is a powerful language for working with relational databases, allowing you to select, manipulate (insert, update, delete), and analyze stored data frequently. However, writing efficient & fine-tuned SQL queries can be a challenging task, especially when working with large table datasets or complex data models. Here we are going to cover 10 best practices to improve queries performance, from using table partition, join types, normalize your tables as when needed, result caching, Analytical functions. With these best practices, you can write faster, efficient & fine-tuned SQL queries that save more server resources for other processes.

1. Table partition

In a database, table, index, or indexed table can be partitioned into smaller pieces called partitions. This process of subdividing the larger record set into smaller chunks is called table partitioning.

· Better performance of the queries because only a partition needs to be checked and not the whole table.

· Administrators can manage smaller chunks of data more effectively.

· Maintenance is easier on the small partitions than the whole large table.

· For a database table, if one partition is unavailable due to some issue and the rest of the partitions are available, this will ensure better database availability for query execution.

Type of partition:

· Hash Partitioning — Used to spread data evenly over partitions. Possible usage: data has no logical groupings. Uses Column Hash value for the partitioning using Hash algorithm.

· Range Partitioning — Used when there are logical ranges of data. Possible usage: dates, Age groups, Scoring, part numbers, and serial numbers.

· List Partitioning — Used to list together unrelated data into partitions. Possible usage: a number of states list partitioned into a region.

· Composite Range-Hash Partitioning — Used to range partition first, then spreads data into hash partitions. Possible usage: range partition by date of birth then hash partition by name; store the results into the hash partitions.

· Composite Range-List Partitioning — Used to range partition first, then spreads data into list partitions. Possible usage: range partition by date of birth then list partition by state, then store the results into the list partitions.

2. Use Appropriate Join Types

Joins are an essential part of SQL queries, allowing you to combine data from multiple tables. However, not all join types are created equal, and choosing the right join type can have a significant impact on query performance.

For example, consider the following query:

SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;

This query uses an inner join, which returns only the rows that have matching values in both tables. While inner joins are efficient for small datasets, they can become slow and resource-intensive for larger datasets.

In some cases, you may be able to use a more efficient join type, such as a left join or a sub query. For example, if you only need to retrieve orders for customers who have placed at least one order, you could use a left join like this:

SELECT orders.order_id, customers.customer_name
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id
WHERE orders.order_id IS NOT NULL;

This query uses a left join to retrieve all customers, along with any matching orders, and then filters out any rows where the order_id is null. By using a left join instead of an inner join, we can potentially reduce the number of rows that need to be processed, improving query performance.

3. Index Your Tables

Indexes are a powerful tool for optimizing SQL queries, allowing the database engine to quickly locate and retrieve data based on specific columns. Without an index, the database engine must scan through every row in a table to find the relevant data, which can be slow and resource-intensive.

To create an index, simply specify the column or columns that you want to index when creating the table. For example, to index the customer_id column in an orders table, you could use the following SQL statement:

CREATE INDEX idx_orders ON orders (customer_id);

By creating an index on the customer_id column, the database engine can quickly locate and retrieve orders for a specific customer, without having to scan through every row in the table.

However, it’s important to use indexes as requirement and data fetching, as too many indexes can actually slow down queries DML statements. As a general rule, you should only index columns that are frequently used in queries and have a high cardinality (i.e., contain many distinct values).

Index types:

· Binary Tree Index (Unique, Non Unique, Composite)

· Functional based

· Bitmap

· Domain Index

4. Optimize Your Table Structure

The structure of your database tables can have a significant impact on query performance. By optimizing your table structure, you can reduce the amount of data that needs to be processed, speeding up queries.

One common optimization technique is to normalize your tables, which involves breaking up large tables into smaller, more specialized tables. For example, instead of storing all customer information in a single table, you might create separate tables for customers, orders, and order items.

Another technique is to use data types that are appropriate for your data. For example, if you’re storing dates, use the DATE data type instead of a text string, which can be more difficult and resource-intensive to search and manipulate.

5. Use Sub queries

Sub queries are a powerful tool for working with SQL queries, allowing you to nest one query inside another to retrieve more specific data. By using sub queries, you can reduce the amount of data that needs to be processed, improving query performance.

For example, consider the following query:

SELECT customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders
WHERE order_date >= ‘2022–01–01’);

This query uses a subquery to retrieve all customer names for customers who have placed an order since January 1st, 2022. By using a subquery instead of a join, we can potentially reduce the number of rows that need to be processed, improving query performance.

6. Analytical function

Oracle provides great feature to use analytical function to do computation, aggregation and ranking of table record set. It usage powerful algorithm for this computation and provide very faster result. The reporting and computation is easy with these analytical functions.

· Rank

· Dense_rank

· Row_number

· NTILE

· Lag

· Lead

· List_agg

7. Use UNION, UNION ALL to Combine Results

The UNION operator allows you to combine the results of multiple SELECT statements into a single result set. By using UNION, you can simplify your queries and reduce the amount of data that needs to be processed. UNION returns unique record set where UNION ALL returns all records of combining tables include duplicate records.

For example, consider the following query:

UNION

SELECT customer_name, order_date
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE order_date >= ‘2022–01–01’
UNION
SELECT customer_name, payment_date
FROM customers
JOIN payments ON customers.customer_id = payments.customer_id
WHERE payment_date >= ‘2022–01–01’;

UNION ALL

SELECT customer_name, order_date
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE order_date >= ‘2022–01–01’
UNION ALL
SELECT customer_name, payment_date
FROM customers
JOIN payments ON customers.customer_id = payments.customer_id
WHERE payment_date >= ‘2022–01–01’;

This query uses UNION to combine the results of two separate queries, one for orders and one for payments, into a single result set. By using UNION, we can simplify our queries and reduce the amount of data that needs to be processed.

8. Use LIMIT and OFFSET

The LIMIT and OFFSET keywords allow you to retrieve only a specific number of rows from a query, starting at a specific offset. By using LIMIT and OFFSET, you can reduce the amount of data that needs to be processed, improving query performance.

For example, consider the following query:

SELECT customer_name, order_date
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE order_date >= ‘2022–05–01’
LIMIT 20
OFFSET 30;

This query retrieves only 10 rows, starting at row 20. By using LIMIT and OFFSET, we can reduce the amount of data that needs to be processed, improving query performance.

9. Use Result cache

Most modern database systems support query caching, but it’s important to be aware that the cache is only effective for frequently used queries. If a query is only run once or infrequently, it may not be worth caching the results.

Result Caching is a new feature introduced in Oracle 11g. Since its induction in the 11g release, caching of result sets has emerged out as one of the most efficient performance tip in modern database applications. It can be implemented in both SQL and PL/SQL. The last executed query output is stored in database server cache and next time same query will not executed again and output will be extracted from result cache.

On a result cache enabled database server, if an SQL query or PL/SQL function is executed in cache mode, Oracle caches the data result set in the cache component of the memory.

Oracle Result cache

The cached result would be shared by the server for all the subsequent executions of the query or function call, occurring across the sessions. The result sharing process evolves out as much efficient and performance oriented than the normal repetitive execution process.

10. Table Stats gather

May be some queries taking more to execute because of full table scan and query plan change. If Stats are not appropriate then table went for full table scan. CBO optimizer generate plan on the basis of complete statistics. So if stats are not gathered then most OLTP related tables went on full table scan. And query execution time would be higher. (using dbms_stats.gather_statistics or Analyze table <tab name> for all columns all indexes all columns all rows;)

One might need to collect fresh statistics to higher OLTP tables to make sure vital execution plans don’t change. It’s the act of not refreshing statistics that can cause execution plans to suddenly change.

Analyze Table: Statistics need to be periodically generated.

ANALYZE command syntax & example:

Analyze <Table | Index> <x>

<compute | estimate | delete> statistics

<sample <x> <Rows | Percent>>

Analyze table emp estimate statistics sample 30 percent;

DBMS_STATS: Successor of Analyze command

Dbms_stats.gather_index_stats(<owner>,<index>,<blocksample>,<est.percent>)

Dbms_stats.gather_table_stats(<owner>,<table>,<blocksample>,<est.percent>)

Dbms_stats.delete_index_stats(<owner>,<index>)

Dbms_stats.delete_table_stats(<owner>,<table>)

SQL>exec dbms_stats.gather_table_status(‘scott’,’emp’,null,30);

In conclusion, optimizing SQL queries is a critical skill for any database professional. With these best practices, you can write faster, efficient & fine-tuned SQL queries and make your databases more efficient and effective. Remember to always test your queries in lower test environment and monitor query performance, query tracing to ensure t give higher performance.

Follow me for more such content on SQL, Data Engineering, Data modeling, and database design topics.

Happy reading…

--

--

Ashok Choubey

Data Architect | Azure Cloud | IBM Cloud | Database Architect | Subject Matter Export | Cloud Data Engineering | Data Analytics | Data Science