Query Optimization techniques and factors
Common pitfalls and various factors to be considered for optimization
Applying indexing for query optimization is considered an easy and a go to technique by most of the backend developers.But rarely do they know that random indexing does not necessarily produce faster results to the queries.
Let’s look into the common pitfalls that are faced when random attributes are considered when indexing and the factors that need to be considered during tradeoff.
An index is an Ordered representation of the indexed data.
The person who can optimize queries by creating a right index is the person who knows what the queries are.
Since the table often contains millions of rows, searching for a particular record or a range of records include comparing each record attributes with the required values. Thereby consuming more time and huge amount of RAM.
To overcome this problem, indexing is applied.
Indexing can be applied on multiple columns based on our query. This indexing uses B-Tree data structure for searching right record with the required value. B-Tree is a self-balancing search tree.All the nodes to the right of the root node are greater than the nodes to the left. The main idea of using B-Trees is to reduce the number of disk accesses. Most of the tree operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses where h is the height of the tree.
The leaf nodes are connected to each other in a doubly linked list manner. Thereby allowing access to the previous nodes from the current node.
Additionally, the database stores one extra piece of information for every value called ROWID. It is a database internal identifier that identifies a particular row in the memory table.It is not your primary key but a database internal value.
Each leaf node may either point to the address of a single page in the secondary memory table or only particular row in that table.
Now that we know how indexes are stored in the memory, let’s move on to various factors that has to be considered while forming indexes.
Note: For getting an execution plan
EXPLAIN SELECT *
FROM ( );
The types column from the above executed query result tells us how the database is going to access our data.
ACCESS TYPES
> CONST / EQ_REF:
Performs a B-Tree traversal to find a single value in the index tree. This is basically performing binary search. This type is used when we can guarantee uniqueness of our result set.
In this type, the B-Tree is traversed until the required value is found in the leaf node. Since only a single unique value is searched and the complexity reduces by half for each traversal to the child node, this is super fast.
Hence, stop optimizing the query as it can get no more faster.
>REF / RANGE :
This type is a index range scan. It performs a B-Tree traversal to find the starting point and scans values from that point on. It scans all the leaf nodes connected to each other through the linked list until it finds the node with value greater than the required range.
Hence, it limits the total number of rows that the database has to inspect to perform a query
> INDEX :
Also known as the Full Index scan. It starts at the first leaf node and traverses through the entire index. So, the entire table is bought into memory utilizing more time and space.
> ALL :
Also known as a Full Table scan. It does not use an index at all. It loads every column of every row from the table and then scans through all of them and emits or discards accordingly.