HammerDB is one popular and easy-to-use tool that a lot of people are using to benchmark performance of the databases. In the previous article, I have used 100 HammerDB users to measure performance of the workload on TPCC database with 5000 warehouses, and the conclusion was that in this configuration we can reach up to 160-200K transaction/minute and that probably the main bottleneck is the fact that 40GB memory that is available on 8-core Managed Instance is not enough to cache the data for 500GB database. In this article I will run the same experiment on the equivalent 8-core Business Critical instance.
WARNING: HammerDB is not official standard and don’t have precise specification of prerequisites for running the tests. You might get the different results if you repeat the same test that depend on the number of core of your client VM, network bandwidth between Managed Instance and VM, number of other databases on Managed Instances, etc. There are many factors that can affect the results that are outside of the scope of this article. Do not rely on the performance numbers described in this article as performance targets based on official benchmark.
HammerDB is running continuously on a VM in the same networks as Managed Instance (different subnet) and using 30% of CPU on the client VM, so it don’t have CPU bottleneck on the client.
The results of the initial run are shown on the following figure:
HammerDB can achieve 260–430K transactions per minute. Here are some interesting comparisons of the results in this experiment and results from the experiment with General Purpose instance:
- Performance of HammerDB workload on Business Critical Managed Instance is 2–3x better than the performance of the same General Purpose Managed Instance that achieved 160–200K transactions/minute (with the same number of cores/memory).
- On General Purpose Managed Instance I had to pre-allocate both data and log files to achieve optimal performance (which might increase the price of the instance). Without pre-allocation, I initially got 16–20K transactions/minute, so without any configuration you would get 15–20x better performance on Business Critical instance.
In order to examine the potential bottlenecks in this case, I have analyzed wait and file statistics using QPI library:
The dominant wait statistics are related to AlwaysOn replication and fetching data from files into the memory.
I’m investigating are there some IO bottlenecks using the following query:
EXEC qpi.snapshot_file_stats;--> wait some time while the workload is runningSELECT type,
instance_write_throughput_MBps = SUM(write_mbps),
instance_throughput_MBps = SUM(throughput_mbps),
instance_iops = SUM(iops)
group by type
Here I can see that instance write throughput on the instance is near to 32 MB/s, which is the instance log limit (4MB/s per core or 32MB/sec for 8-core instance). However, I don’t see INSTANCE_LOG_GOVERNOR as a dominant wait statistic so probably it don’t hit the limit. I see that instance is using 7K IOPS which is not close to instance data/log throughput 11K IOPS for 8-core instance.
I’m using the following query to check the resource usage:
select top 100 start_time, avg_cpu_percent, virtual_core_count, sku, hardware_generation,
iops = io_requests/datediff(second, start_time, end_time)
order by start_time desc
Instance is using up to 90% of the CPU which might be the limit, and I might need more CPU.
I have also checked do I have enough memory for this type of workload and I found that the limit is far below the the minimum:
Low PLE value might explain high PAGEIO wait statistic because Managed Instance is fetching and writing pages from the buffer pool into local SSD.
The last thing that I checked is what is the impact of backup on the performance:
Unlike General Purpose Instance where performance dropped significantly while the backup was running, here I see smaller impact on performance.
This article shows (unofficial) performance results of running HammerDB on 500GB TPCC database.
The main finding is that Business Critical instance is giving you 2–3x better performance for this type of workload compared to equivalent General Purpose instance.
Since this is unofficial test that is not done in some controlled lab environment with well defined resource definitions of client VM, network bandwidth, you might get the different results. If you run similar tests using HammerDB let me know what are the results and bottlenecks that you see and I will reference your experiment from this article.