SQL Performance is just not Queries

Published in

Ula Engineering

9 min readMay 3, 2022

NOTE: This Blog is mostly for Mysql Engine but some of them are common for other SQL Engines as well

For those of you handling and building systems, know how important databases are and how their performance affects your complete architecture. Imagine how bad your Product would look like if the Catalog Page does not load, or the customer is not able to place Orders and in the background, you realize that the DB is the culprit. Since DB fixes and upgrades take a lot of time and in the meanwhile you might lose on customers, money, transactions, and what not based on the scenario, it’s extremely important to understand how you should operate to keep the SQL Performance in place.

One normal call which people say is to write nice queries and make sure you have a nice and hefty system in place it’s important to keep the system in place much before that, here in this article I have pointed out a few pointers to keep in mind for a good SQL Architecture.

Structure of Table

Whenever creating a table for any use we must always be mindful of the structure, as you can always change wrong queries but not fix incorrect table structures. So whenever you are creating a table have a vision of some of the following situations and then take decisions

Read/Write Ratio [ OLTP/OLAP ]: What’s the use this table is solving, is it for analytical purposes [ high read ], or will have high write operations such as storing logs or event data [ more Write/Updates ]. Now the question arises that how does this information affect the perforation. So consider a case where we want to store data for mostly analytical purposes, like users browsing history and analyzing patterns on it. A rule of thumb, in this case, would be to keep data all in fewer tables and more columns, requiring fewer joins and union operations and would give better performance than if we had more tables, Indexing and other optimizations would start failing at scale
Data Type of Columns: What is the type of usage of data you are going to store in that table. Keep the datatypes as small as possible for each case because they use less space on the disk, in memory, and in the cache. They generally require fewer CPU Cycles and give better performance. An example for this case could be comparing the DATETIME and TIMESTAMP columns, they store the same kind of data, but TIMESTAMP occupies significantly lesser storage space, also providing auto-updating capabilities. But on the flip-side, the range of values allowed is very small thus limiting its usage.
NULL IS NOT GOOD: You might have read this at multiple places, if you haven’t let me guide you through, don’t keep the default value of your columns as NULL, unless and until absolutely necessary. The reason is, that NULL columns cause Index creation, Lookup operation more complicated than required, and use more storage space and extra CPU Processing. Although the performance improvement moving from NULL to NOT NULL is not that significant but still would help on a large scale
Break if it’s Big: Always try to keep data in such form that it’s queryable and index worthy. Suppose you want to search for all customers living in Bengaluru and you made the Table as such

Now you see the table will have one address column having the complete address for a User, leading to LIKE queries for fetching all user's address columns.

Instead, if you look here, now if you want all users from Bengaluru, you might end up with joins, but if your table has proper Indexes, this would still perform better than the above process of LIKE Queries

But an important note at the same time, Do not use this logic everywhere. We must see the situation if we really want to separate data out or not and then only think of doing this.

Index Can Never Go Wrong

Any time you face an issue in your SQL Performance, every blog you might have read, or any person who has worked on high Scale Systems you would have asked might have had the same question, “Are the indexes in place” and I assure you, I won’t deny of the fact since it’s one of the most important aspects that you need to keep in mind to level up your system game

So each query performance is measured by how much did it cost to find a single row in that table. To find a row we normally have WHERE conditions, let’s first understand how it executes

So once you run a WHERE command this is what the SQL Engine does

Look over those conditions on Index LookUp and remove the rows to eliminate
After retrieving this data from the Index LookUp table it moves this data to Server Layer [ not fetching data from Table ]
Now the where condition works to filter the data based on conditions

So with the above steps, it’s clear that if the indexing is correct we’ll just query the required data and filter rather than a complete Table scan

Adding to this, are JOINS, Proper Indexing and Join on such columns will increase Join performance as well, like joining on two columns having the same size and type, and then if they are Indexed, we will give better performance when joining

But not all tables require INDEX, so suppose a table that has no WHERE condition data and fetch Complete Table, in such cases reading sequentially performances better as there is no use case of IndexLookUp Operation + it minimizes disk seeks.

Also, there are also cases where we can Over INDEX, leading to slower Inserts and Update Operations

The reason for the same is if the INDEX is not proper when you insert a value it goes to a random location in the Index, and now to keep the data in sync and correct retrieval will require random disk access and more than the required CPU Cycles and Clustered Index Breaking up will have impacts.
Indexing also causes the CPU cache to be underperforming since Cache size will increase and thus no specific performance changes will be seen due to CPU Cache.

Configuring the Write Storage Engine

Before going forward as to why choosing the write Storage Engine is important, let’s first understand what Storage Engine is

So quoting the textbook definition here

The storage engine (or database engine) is a software component of a database management system responsible for storing, retrieving, and managing data in memory and on disk, designed to capture a persistent, long-term memory of each node

So to quote in basic terms, it’s the brainchild of your DBMS to handle all the operations, so this goes without saying your System is as good as your Storage engine performances.

Although there are a lot of Storage Engine available for MySql, we’ll discuss the two most popular ones that are InnoDB and MyISAM

By default, MySql uses InnoDB due to its vast transaction support. There are certain advantages of InnoDB like giving complete ACID Support, Row Level Locking, Crash Recovery [ Rollbacks], and Multi-Version Concurrency Control (Non-Locking Reads ). It is the only engine that provides foreign key referential integrity constraint

Alternatively, MyISAM is also referred to as the original storage engine. It has its own benefits by providing full-text search and table-level locking although MySql 5.7 has given all these features leaving InnoDB as the default choice and the rest as alternatives to it

But now the choice arises, What common configurations for InnoDB and how do you use it

Adjusting the Buffer Pool

Buffer pool size is the size of the in-memory data store storing data stores, caches, and Indexes. The innodb_buffer_pool_size command denotes the amount of memory you are allocating to the InnoDB Buffer Pool. According to the scale you are handling, as larger the pool size the more it performs like an In-Memory storage engine. But a general rule of thumb here would be to not assign this value more than 70–80% of your system RAM. But not ideal for large scale and you might want to increase this number because with the configuration you will end up leaving most of your space ideal [ by large scale I mean more than 100 GB systems ].

Another approach here would be to understand the internals of InnoDB and then make a decision. Quoting how High-Performance MySql describes this process to be

- Start with total RAM available.
- Subtract suitable amount for the OS needs.
- Subtract suitable amount for all MySQL needs (like various MySQL buffers, temporary tables, connection pools, and replication related buffers).
- Divide the result by 105%, which is an approximation of the overhead required to manage the buffer pool itself.

It’s a more nuanced approach to deciding the pool size. Although these approaches have been guided by fine systems, as in every concept in Software Engineering there is no one size fits all policy. This would require a high amount of fine-tuning, monitoring and adjusting, and then allocating of the right size.

Configuring Number of Concurrent Threads

Setting a limit on the number of concurrent threads that InnoDB processes is an important change to handle bottleneck. So InnoDB uses OS Threads for all it’s processes and transactions, although since we have such well performing systems this is highly a problem but can cause issues. When InnoDB receives a new request from a user session, if the number of threads concurrently executing is at a pre-defined limit, the new request sleeps for a short time before it tries again. A request that cannot be rescheduled after the sleep is put in a first-in/first-out queue and eventually is processed. Threads waiting for locks are not counted in the number of concurrently executing threads.

You can limit the number of concurrent threads by setting the configuration parameter innodb_thread_concurrency The default value of the same is 0, so that by default there is no limit on the number of concurrently executing threads.

The correct values of these variables depend on your environment and workload. Try a range of different values to determine what value works for your applications. Before limiting the number of concurrently executing threads, review configuration options that may improve the performance of InnoDB on multi-core and multi-processor computers.

Using Adaptive Hash Index

Adaptive Hash Index transforms your DB into more of an in-memory database. When properly used, it can provide a combination of workload and sufficient memory for the buffer pool without sacrificing transactional features or reliability as per MySql Documentation. It can be enabled using innodb_adaptive_hash_index on-server startup. We can consider it as a Cache where, If a “Hit” happens, we have much better lookup performance; if it is a “Miss,” then performance gets slightly worse (as checking a hash table for matches is fast, but not free). But at the same time, the cost of maintaining this cache which is compared to the number of rows added or updated from the data store. If the HIT ratio is low that means the cost of maintenance is too much for the benefits. But if the data becomes HOT [ too many hits ] then we might need to disable it at the same time

I’ll end this blog here since it already has covered a bit of information [ but surely not all like limiting the amount of data fetched and read ahead examples ]

A word of caution, no configurations should be set aimlessly and without proper monitoring. Please go through each reference item below and then maybe take better calls

Reference Reads and Sources
https://dev.mysql.com/doc/refman/8.0/en/optimization.html
https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/index.html
https://www.virtual-dba.com/blog/thread-concurrency-explained-not-a-fifo/