Measuring file IO performance characteristics using QPI library

One of the probably most important metric that you need to measure and monitor is IO performance characteristics. In most of the cases performance of your database depends on the speed of underlying storage and files where the data is stored.

If you want to be able to quickly troubleshoot some performance issues you need to know how to quickly determine IO performance of your files.

Measuring the file performance and comparing performance characteristics between SQL Server database engines in Azure cloud and your on-premises environment might be tricky and require some better knowledge of SQL server system objects and views. In this post, you will see how to use one open-source library that can help you to analyze and compare file performance.

SQL Server and Azure SQL Managed Instance enable you to measure IO characteristics of your database files using sys.dm_io_virtual_file_stats data management object. However, you must be aware that this DM returns cumulative values that should be sampled and you would need to calculate the differences in IO stats between two period of times. If you need to extract information from this object, you should read the following articles:

DO NOT just read values from this view. This view contains data accumulated since the time when SQL Server or Managed Instance service is started, so you would see average values. Average values might be misleading and you should take the snapshots and compare performance between two points in time.

If you don’t have prepared scripts you can use the following open-source library: https://github.com/JocaPC/qpi where you have some useful prepared views that can help you analyze your file latency. This is a helper T-SQL library that enables you to:

  1. Take the snapshots of your sys.dm_io_virtual_file_stats
  2. Calculates the values such as IO throughput, latency, etc. Calculations are same/similar as in Paul & Erin scripts.
  3. Enables you to go back in past and examine the latency of the previous snapshot — this is a small additional functionality compared to Paul & Erin scripts.

Prerequisite — install QPI

As a first step you should install QPI library on your Managed Instance/SQL Server. Go to installation section and choose the version of QPI library depending on your SQL Server version. Only versions higher than SQL Server 2016 are supported because it uses SQL Server 2016 temporal tables to store the history of IO statistics.

This is plain T-SQL script that you can review to make sure that there is nothing dangerous inside.

This script will add a set of views/procedures in QPI schema in your database.

Analyzing io statistics

First you need to take a snapshot of the current values in sys.dm_io_virtual_file_stats DM object using the following command:

EXEC qpi.snapshot_file_stats;

This is the baseline for IO statistics and you can get the cumulative/average values until this point.

It would be good to create SQL Agent job that periodically takes the snapshot of your file statistics by executing the command above. You can find the sample script that creates QPI Agent job on QPI site.

Now you need to keep your workload running until the moment where you want to measure IO performance using the following view:

SELECT * FROM qpi.file_stats;

In the results of this view you can find information about the size of each file, IOPS, throughput, latency, etc.

You can also get the IO statistics in the current database:

SELECT * FROM qpi.db_file_stats;

The results are shown on the following image:

Most of the columns are self-explanatory, but there is difference in the following two:

  • read_latency_ms/write_latency_ms represent average latency to complete request end-to-end.
  • read_io_latency_ms/write_io_latency_ms represent average latency to complete request in IO subsystem. Ideally it should be same as read_latency_ms/write_latency_ms. If there is a difference, then SQL Server Database Engine cannot catch-up procession IO requests or introduces some IO throttling.
If you are using General Purpose service tier and you see a longer latency, you might try to increase the size of the file that has high latency. IO and throughput of the files in general Purpose tier depend on the file size so with bigger files you are getting more IO resources that can improve performance of your workload.

You can also get the file statistics from some point in time in the past (make sure that you had done snapshot before this pint in time):

SELECT * FROM qpi.file_stats_as_of( <some date> );

If you are comparing IO performance results on Managed Instance and your on-prem or Azure SQL VM (IaaS) try to compare the results from this query in order to get the consistent results and fair comparison.

Conclusion

QPI is a set of useful scripts that can help you to more easily analyze performance of your database files.

QPI is open-source library and if you find any issue in the functions feel free to post the bug submit or pull request with a fix.