Gwynn Group — May 24, 2019
Over the past 18 years of working on software projects that required a DBMS as the backend storage, we’ve had our fair share of SQL related performance issue due to the way we wrote our queries. Below are some of the tips we’ve learned through experience and through reading SQL books and magazines on how to write a more efficient query.
THEORY BEHIND QUERY RUN TIME
MS SQL is just another piece of software that runs on a computer, thus subject to the same limitations as all software — it can only process as much information as its hardware is capable of handling. There are many ways to optimize a DBMS to improve performance. In this brief session, we will concentrate on performance tips when writing our queries. Ultimately, we want to reduce the number of calculations a DBMS must perform to reduce the time it takes to execute our query.
Some of the things we need to keep in mind when writing our queries are as follows:
- Table Size: The larger the table, the longer it takes to read through the data.
- Joins: If your joins substantially increase the row count of the result set, your query is likely to be slow.
- Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.
Reducing Table Size
Filter the table to return only what you need. Try to use indexed fields for the predicate as non-indexed fields will cause a table scan which can dramatically reduce query performance. For example, if you’ve got time series data, limiting to a small time window can make your queries run much more quickly as we will be doing an index seek versus a table scan.
Simplifying the Joins
Try to reduce the table size before joining it to another table. For example, the query below joins information about college sports teams onto a list of players at various colleges:
There are 26,298 rows in players. That means that 26,298 rows need to be evaluated for matches within the other table. However, if the players table was pre-aggregated, you could reduce the number of rows that need to be evaluated in the join. First, let’s look at the aggregation:
The above query returns 252 results.
Dropping the aggregated query in a subquery and then joining to it in the outer query will reduce the cost of the join substantially:
In this particular case, you won’t notice a huge difference because 30,000 rows isn’t too hard for the database to process. However, if you were talking about hundreds of millions of rows or more, you’d see a noticeable improvement by aggregating before joining. When you do this, make sure that what you’re doing is logically consistent — you should worry about the accuracy of your work before worrying about run speed.
13 QUERY PERFORMANCE TIPS
1. Owner/Schema Name
Remember to prefix object names (i.e. table, stored procedure, view) with its owner/schema name.
Reason: If the owner/schema name is not provided, SQL Server’s engine tries to find it in all schemas until the object finds it. SQL Server engine will not search for the table outside of its owner/schema if the owner/schema name is provided.
2. The * Operator
Do not use the * operator in your SELECT statements. Instead, use column names.
Reason: SQL Server scans for all column names and replaces the * with all the column names of the table(s) in the SQL SELECT statement. Providing column names avoids this search-and-replace and enhances performance.
3. Nullable Columns
Avoid using NOT IN when comparing with nullable columns. Use NOT EXISTS instead.
Reason: When NOT IN is used in the query (even if the query doesn’t return rows with the null values), SQL Server will check each result to see if it’s null or not. Using NOT EXISTS will not do the comparison with nulls. Also, NOT EXISTS returns only two states (true or false), while NOT IN can return up to three states (true, false, NULL) and might not give you the result you were expecting.
4. Table Variables and Joins
Avoid using table variables in joins. Use temporary tables, CTEs (Common Table Expressions) or derived tables in joins instead.
Reason: Even though table variables are very fast and efficient in a lot of situations, the SQL Server engine sees it as a single row. Due to this reason, SQL will produce an execution plan that will perform horribly when used in joins.
5. Stored Procedure Names
Do not begin your stored procedure’s name with sp_.
Reason: When the stored procedure is named sp_ or SP_, SQL Server always checks in the system/master database even if the owner/schema name is provided. Providing a name without SP_ to a stored procedure avoids this unnecessary check in the system/master database in SQL Server.
6. Use SET NOCOUNT ON
Use SET NOCOUNT ON with DML operations.
Reason: When performing DML operations (i.e. INSERT, DELETE, SELECT, and UPDATE), SQL Server always returns the number of rows affected. In complex queries with a lot of joins, this becomes a huge performance issue. Using SET NOCOUNT ON will improve performance because it will not keep track of the number of rows affected.
7. Avoid Using GROUP BY, ORDER BY and DISTINCT
Avoid using GROUP BY, ORDER BY, and DISTINCT whenever possible.
Reason: When using GROUP BY, ORDER BY, or DISTINCT, the SQL Server engine creates a work table and puts the data on the work table. After that, it organizes this data in the work table as requested by the query, and then it returns the final result.
8. Check Indexes
There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Reason: When fields are not indexed, SQL Server will typically do a full table scan and this may reduce performance. Unless the table is very small, a table scan tends to yield the worst performance out of all the types of database reads.
9. Use Same Datatype on JOIN and WHERE Clauses
This is easier said than done depending on your permissions to make changes to the schema.
Reason: When joining or comparing two fields with different datatypes, SQL must do an on-the-fly conversion of the field before it can do a comparison, even if the fields are indexed. If mismatched datatypes are unavoidable, try to cast the larger datatype to the smaller datatype whenever possible.
10. Avoid Using Calculated Fields in JOIN and WHERE Clauses
This can be done by creating a field with the calculated values used in the join on the table. See below.
Performance can be improved by adding a column with the year and month in the sales table. The updated SQL statement would be as follows:
Reason: When calculated fields are used, SQL must do an on-the-fly computing of the field before it can do a comparison even if the fields are indexed.
11. Avoid Multiple Joins in a Single Query
Try to avoid writing a query using multiple joins that include outer joins, cross apply, outer apply and other complex subqueries.
Reason: When the joins are complex, the number of possible execution plans exponentially increases. This, in turn, will reduce the choices for the query optimizer to decide the join order and join type. Sometimes, the query optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or subqueries. Also, due to time and performance constraint, SQL Engine will try to generate the “good enough” execution plan versus trying to generate the best possible execution plan.
12. Avoid Multi-Statement Table Valued Functions (TVFs)
Multi-statement TVFs are more costly than inline TVFs.
Reason: SQL Server expands inline TVFs into the main query like it expands views, but evaluates multi-statement TVFs in a separate context from the main query and materializes the results of multi-statements into temporary work tables. The separate context and work table makes multi-statement TVFs costly.
13. Use Data Compression Whenever Possible
If your SQL Servers are I/O bound and not CPU bound, turning on data compression will actually improve performance in your environment.
Reason: A majority of the time, it takes longer to read from the disk subsystem than it does to decompress the data. Since the bottleneck is already at the I/O level, reducing the size of the table or index on the disk will reduce the time required to read those data.
Linked server four-part queries are also called distributed queries. Using distributed queries, you can refer tables on different data sources/servers in a single query. Query optimizer creates an execution plan by looking at the query and breaking it into remote and local queries. Local queries are executed locally and data for remote queries are collected from the remote servers, scrubbed locally, combined together and presented to the end user as a single record set.
If you look at the execution plan by running SET STATISTICS PROFILE ON, you can see that for executing a distributed query, SQL1 sends a request to SQL2 to send the statistics information for table emp in the database test.
Another disadvantage of distributed query is that, though you have a WHERE clause, you might notice that when the query is sent to retrieve the rows of a table in the remote server, SQL Server will just send a SELECT * FROM the remote table and then locally filter out the necessary data after applying the predicates.
Execute the specified pass-through query on the specified linked server. SQL Server sends pass-through queries as un-interpreted query strings to an OLE DB data source. That is, SQL won’t apply any kind of logic on the query and won’t try to estimate what that query would do. Instead, it would simply pass the specified query as it is to the target linked server. Open queries are useful when you are not referencing multiple servers in one query. It’s generally fast, as SQL does not break it into multiple operations and does not perform any local action on the output received.
In OPENQUERY, SQL Server sends the complete query to remote server SQL2 and resources of the SQL2 are spent in processing the query like parsing the SQL statements, generating a plan and filtering the rows as per predicates. Then, the final result set is sent to SQL1 for further processing.
Like what you read? Be sure to give us a clap and comment below how our SQL performance tips worked for you!