Impact of log file size on General Purpose Managed Instance performance
Azure SQL Database Managed Instance is SQL Server database engine hosted in Azure cloud. General Purpose tier of Managed Instance is placed on Azure infrastructure where SQL Server process is placed on stateless VM, and data files are stored on Azure premium storage.
One of the important thing you will need to be aware of is that IO performance of Azure Premium disks depend on the disk size, as it is shown in the following table
Since General Purpose Managed Instance places all database files on Azure Premium disks, performance of each database file depends on the file size.
You can find more information about the storage layer of General Purpose Managed Instance in this article.
If you have the issues with file IO performance, then you might fix the issues by pre-allocating bigger log files to get more IOPS/throughput.
Pre-allocation of files is not some new technique. This is well known technique even in SQL Server because with file pre-allocation, you are avoiding auto-growth and decreasing a number of VirtualLog files(VLF) in your log file. In Azure this has additional effect on performance because it provides you with more IOPS/throughput.
In this article you will see how increasing the file size improves performance.
In this experiment I’m executing the following T-SQL script that inserts a random number in a single column table in the infinite loop:
DROP TABLE IF EXISTS th1;CREATE TABLE th1(object_id bigint);while(1=1) begin insert into th1(object_id) values (<i>);end
This query is executed on an empty database placed on Managed Instance (<i> is replaced with a random integer value).
I’m using Query Performance Insight library to analyze performance of Managed Instance. This library contains TSQL views and procedures that represent the wrappers around standard Query Store and system views and enable easier performance analysis.
With the installed QPI library, I need to take a snapshot of wait and file statistics because SQL Server/Managed Instance gather only accumulated file statistics data. The following two procedures take the snapshots of file/wait statistics in your database:
exec qpi.snapshot_file_stats;exec qpi.snapshot_wait_stats;
I’m running the workload and I’m trying to find what are the top wait statistics on my instance using the following query:
As you can see, my script is log-write heavy and top wait statistic is Tran Log IO/WRITELOG indicating that the bottleneck is on log write write.
Now, I can get the file statistics on the database files:
I have a small log file with the size less than 1GB and I’m getting around 350 IOPS. With the smaller file I can insert around 1.5 MB/s in this scenario.
In theory, IOPS value should be equal to the IOPS limit value shown in the table above if we are reaching the remote file IO limits. However, in practice this value is smaller due to the following reasons:
- IO operation that is counted in Managed Instance is not always the same as IO operation counted by Premium disk. Managed Instance will count any IO operation with any size as a single IO, while Azure Premium disk can break it into several IO operation so it might count higher IO rate while comparing it with IO limits.
- If the client (Managed Instance in this case) of Azure Premium disk is reaching the IO/throughput limit for the given file size, Azure Premium disk will “throttle” the client. Throttling means that Azure Storage will allow higher ratio for some short time, but then it will slow down the subsequent requests. In average, IOPS value would be lower than a limit.
Although the IOPS value might not be exact match with the table above, it is still a good metric that can tell you how many IOPS you are using.
In the second part of the experiment, I’m executing the same query on a database with 300GB log file. I have increased log file size using the following T-SQL statement:
alter database LP15
modify file (name='log', size = 300GB)
By looking on the wait statistics I can see that Tran Log IO/WRITELOG is still dominant wait statistic so I’m still getting the log throttling:
However, when I look at the IOPS in log file, I can see that I’m getting near to 1000 IOPS:
My workload is again write-heavy and still hitting the log write limit, but it runs much faster, and I can insert 3.7 MB/s instead of 1.5 MB/s as in the previous case.
Finally, I’m increasing the log file size to 1200GB using the following command:
alter database LP20
modify file (name='log', size = 1200GB)
And analyzing file statistics again:
With the bigger log file I’m getting more IOPS (1300 in this case) and better performance with 5MB/s throughput.
Knowing infrastructure characteristics where Azure SQL Database Managed Instance is running is very important if you want to understand and troubleshoot performance issues on Managed Instance. In this example you could see how the file size of the logs improves the performance of Managed Instance.
In this experiment, I have improved performance of workload by increasing log size. In some other cases, you might need to increase data file size if this is the bottleneck.