SQL is easy to follow and write quick and handy queries on shot. Here is the catch, Well some queries might take time, some might crash and some queries look similar but vary in the computational time. Here’s why sql tuning is a must, if your tables are in production ;) Here are the 6 things to keep in mind.
Yes, when you begin creating tables, procedures or writing queries. Are the requirements clear like who, why, when, where and what questions duh? Are the requirements specific ? Are all the individuals and stakeholders involved in developing this query?
Indexes are paramount for good databases and application performance. Poorly designed indexes might be a nightmare sometimes.
Time : When a query hits a particular column in the Database and the same column is indexed. It searches in O(log(n)) time B-Tree rather than O(n) time.
PreCompute : SQL sorts the data one by one. Everything for that column is sorted.
Creating indexes for all the columns would give memory issues and duhh.. Big no no! Creating not a single index would result in the queries running slower and performance issues will be an impact.
Here is the guide to create useful database indexes https://www.dbta.com/Columns/DBA-Corner/Top-10-Steps-to-Building-Useful-Database-Indexes-100498.aspx
3. Normalize or DeNormalize, Which one ?
Another way to optimize the performance is denormalization. So, first let’s talk about normalization. Normalization is putting each and independent chunk into a single place.
Well, retrieving different chunks would require more time, since it has to go to different places to find it. This can be slow, Right ?
So, denormalization is the opposite of normalization. In other words, all specific chunks for the queries to retrieve in a place (or) one chunk in many places depends on the design. This gives a huge impact on query performance. The only reason to denormalize the database is to increase the performance. But not to forget the rule, never denormalize the database unless it is affecting the performance. Every denormalization decision should be documented and reasoned for the same. Such that future decisions are easier to take.
4. SQL Clauses.
SELECT : Using SELECT * would process unwanted data. Avoid SELECT *, use specific fields which are required. It pulls up required details from the table. Avoid SELECT DISTINCT, internally a GROUPing is required, which takes huge computation power and time. Use more and more fields to retrieve and identify distinctness.
JOINS : Avoid Joins when needed. However when the requirement is there. Avoid cartesian joins, make inner joins. In some DBMS machines the where joins (cartesian joins) we do are automatically inner joined. In those systems there is no difference.
WHERE AND HAVING : use WHERE statements and avoid HAVING statements. Since, HAVING is calculated after the WHERE clause. Yes, we use HAVING with an aggregate function, But do it only when it is required.
WILDCARDS : The wildest search is not an efficient search. So, use a wildcard at the end of the phase. For example:- name like ‘some%’. The advantage here is the combinations are few.
LIMIT AND TOP : Limit the query results or retrieve the top records for the sample operation or testing cases.
TEMP TABLES : it is great to use, but it will increase the complexity of the query. Highly recommended not to use, If the code can be written like that.
AVOID COUNT() : The most common mistake any developer will do is to check how many records are there. This should be avoided. COUNT() is inefficient because it scans the whole table and counts all the things which satisfy that condition.
HUGE QUERIES IN NON-BUSINESS HOURS : Suppose we need more data for analytics or we need data prediction. At the non business hours, We can run the loop queries, CROSS_JOINS, SELECT *, SELECT DISTINCT, multiple queries etc.
COMING TO THE MORE ABSTRACT PART!
5. Free space and Compression.
Free space(fill factor) can be used to leave a portion empty for the newly added data. A parameter PCTFREE, where the DBA specifies the amount of space for the future requests. Another parameter is FREEPAGE. The benefits are inserts are faster, properly clustered data, expandable and better in concurrency. There are some disadvantages like disk storage is higher, scans take longer, fewer rows require more I/O operations. So, the DBA should monitor free space and request based on frequency of inserts, modification, random access by a query, type of processing, row migration, page split etc.
Compression can be used to shrink the database. By compressing the data, the database requires less disk storage. Compression takes DBA to analyse. On one hand, we have disk savings and I/O cost. On another hand, we have additional CPU cost for compress and decompress. But, it is not an option for every database index or table. For smaller chunks of data, it might be possible that a compressed file will be larger than an uncompressed file.
6. Slow queries.
The first solution would be to fetch quickly from the table and do the Math (sorting, aggregating, grouping, finding) in the application layer. And then here comes the tools. There are many tools to analyse the slowness of a query. Here taking an example of SQL Server.
SQL SERVER EXAMPLE
Resource Usage : use the System Monitor to measure the performance. Database Tuning Advisor: It recommends to take a particular action. For example, remove or add an index.
DMV: find the slower queries with the help of DMV(Dynamic management views). They provide all the information such as reads, time, CPU time etc. APM : One of the greatest features of Application Performance Management is ability to track all the SQL queries. Like, How many queries are fired ? What’s the average time ? What are the transactions? Etc.
EXTENDED EVENTS: check diagnostic data