Burst Balance EBS metric in AWS
Finally monitoring of the bursts of GP2 volumes
Recently AWS finally introduced a CloudWatch metric to monitor your EBS General Purpose SSD disks (gp2) I/O credits.
As all may know, General Purpose SSD (gp2) EBS volumes operate based on a concept of I/O credits. Think of I/O credits as of money a disk needs to spend to buy I/O operations (read or write). Each such operation costs 1 I/O credit.
When you create a disk it is assigned given an initial credit of 5.4 million I/O credits. Now these credits are enough to sustain a burst of highly intensive I/O operations at the maximum rate of 3000 IOPS (I/O per second) for 30 minutes. When the balance is drained, we are left with an EBS that is totally non-responsive.
The balance is replenished at the baseline rate calculated according the EBS size —every EBS volume replenishes its credits at a rate of 3 credits * disk size GB (per second), with some exception when the size is smaller.
if size < 33.33GB
rate = 100
rate = size * 3
So if you create a disk, say 300 GB of size, you get initial 5.4 million I/O credits. Let’s say you drained the balance on some I/O intensive workload. Your disk will accumulate the balance at a rate of 900 I/O credits per second, so to get back to 5.4 million again, it will take 6000 seconds.
It is easy to spot that with disks over 1000 GB in size, the credits are spent at the same (or lower) rate they accumulate. Creating 1TB EBSs is a neat trick to get steady IOPS performance at a lower price than Provisioned IOPS disks.
Now you could monitor the rate at which your credits are drained using Write Throughput and Read Throughput CloudWatch metrics for EBS. It was however quite tedious (you had to add them together), they did not provide an absolute value, and did not take burst under consideration.
Fortunately, AWS made a step in the good direction and recently announced a new metric for monitoring bursts of your EBS. The metric is called Burst Balance and is expressed as a percentage (0–100%).
With Burst Balance you can see how far is your EBS volume from draining (or accumulating) the full burst balance. With that information alerts can be set up to notify when the balance reaches 20% so steps to fix I/O problems in the app could be fixed, or infrastructure scaled. Before that it was not so easy, sometimes even apparently well adjusted disk size
The metric is what the community was waiting for, surely I was.