Burstable Performance in AWS

9 min readDec 29, 2022

AWS offers cloud resources (Compute, Network, and Storage) that can achieve higher performance, but for a limited duration, at a much lower cost. For example, Low-cost T-Series instances when running in performance burst mode can outperform more expensive instances (M5, R5..)

“Burstable” performance model is built on the idea that the majority of workloads do not use system resources at peak usage all the time, but instead, reach peak usage in bursts.

Credits are earned during the time when system resources are not stressed, and accumulated credits are then spent when needed to achieve top performance. When credits are exhausted, performance is throttled to a base level, typically, much lower than the burst. AWS Cloudwatch keeps track of credit balance and usage pattern.

Optimum performance at a lower price is the goal when building solutions on a public cloud infrastructure. Careful review of key system metrics (cpu, storage, network) and understanding of workload characteristics can help achieve desired objective. For example:

Low cost T-series instances can use a full cpu core during performance burst mode to achieve almost double the compute power as compared to other fixed performance instances (M5, R5..) that are based on hyper-threaded (HT)cpu.

Note: cpu core has two hyper-thread (HT). Each HT is counted as vcpu by AWS

Instance families like R5, M5 offer network throughput bursts up to 10 Gbps on a smaller instance types (R5.l — R5.xl..).

For example, R5.large can burst up to 10 Gbps for a limited duration

Smaller EBS volumes can achieve higher IOPS (GP2) or throughput (ST1) when running in IO burst mode.

100 GB GP2 — GP3 type volumes can achieve IO burst of 3000 IOPS, that is sufficient for 30 minutes of sustain IO duration. 2 TB ST1 (HD) volume can achieve IO burst of 500 MB/s for a duration of 60 minutes

EFS (Amazon managed NFS service) also uses credit system where small 100 GB storage on EFS file system allows instances to attain IO burst of 100 MB/s for up to 72 minutes each day. Baseline throughput is 5 MB/s when credits are exhausted.

Compute Burst

AWS instance family offers low cost compute burst service that is built around cpu allocation model. T-series instances offer guaranteed baseline performance that is a fraction of a full cpu core capacity depending on the instance size.

T-series instances, however, can deliver much higher compute performance when running in cpu burst mode

For example, T2.micro instance runs at a baseline performance of 10% of cpu core capacity. When T2.micro instance has low cpu usage, it earns cpu credits at a rate of 3 credits per hour that are accumulated into a credit bucket. Credit bucket can have a maximum of 24 hours worth of credits. Credits are used when instance requires more cpu core capacity (up to 100% of cpu core capacity can be used during burst). All newly launched T2 instances receive an initial supply of 30 cpu credits, enough to sustain a single cpu full core usage for 30 minutes.

Caution: Credits earned 24 hours ago get expired and removed from bucket if not used. During performance burst mode older credits earned in last 24 hours clock are applied first towards burst credits .

Linux kernel offers metrics: %steal, “st”, as reported by mpstat and vmstat, that can be used as an early indicator for CPU credit exhaustion.

%steal is the percent of time when instance vCPU is not running or scheduled on a physical CPU by hypervisor. If the metric reports 70% CPU steal time, that means only 30% of physical CPU core is used by the instance. %steal is often thought of reporting noisy neighbor in a multi-tenant public cloud environment, where other tenants may have stolen an instance’s CPU resources. It is only reported on instances that offer burst performance.

For a T-series instance family, %steel means instance was unable to burst due to cpu credits are exhausted

%steal cpu utilization across T-series instance family while running CPU load with cpu burst credits exhausted.

Duration of cpu burst depends on how fast an instance earns credits and credit balance outstanding. For example: T3.2xl earns credit faster than T3.l and thus can burst for longer period.

T2.xl % usr cpu utilization drops when burst credits are slowly exhausted

Network Burst

AWS offers instance types with varying network capabilities. Network throughput of instances is advertised as: Low, Moderate, High, up to 10 Gbps, 10 Gbps and 20 Gbps. AWS instances supports Enhanced Networking feature that allows even a smaller instance to achieve higher network throughput and low latencies.

Enhanced Networking feature allows native driver to run on an instance where it can have direct access (DMA) to subset of NIC hardware resources via PCIe SR-IOV extension that helps achieve low latency networking due to low virtualization overhead.

AWS instances offer Network burst feature on smaller size instances ( l|xl|2xl|4xl). These instances use network credit model, similar to CPU credit model used for T-series instances and IO credit model for EBS GP2,ST1/SC1 and EFS storage.

Instance accumulates network credits during low or no network traffic. Accumulated credits allow these smaller instances to achieve peak 10 Gbps network tput for a limited duration. Once credits are exhausted, network tput is throttled down to a base levels, that is much lower than peak network tput.

Credit system for network tput works best for bursty workloads like Hadoop, large file transfers, that may require higher network throughput for a shorter period of network activity.

Storage Burst

Amazon offers Burstable IO performances on EBS volumes of types: GP2-GP3, ST1/SC1. EFS file system (AWS NFS managed service) also offers performance burst feature.

EBS GP2-GP3 volumes are optimized for higher IOPS and lower latency. These volume types use IO Credit model. IO credit model is based on number of IOPS. Credits are accumulated during low or no IO activity. IO credit allows the instance to achieve higher IOPS (Burst) on a volume. Duration of the burst is dependent on the fill rate of IO credit bucket, that is controlled by the size of the volume.

Larger the volume faster the IO credits earned. Maximum IO credit per volume is 5.4 Millions IOPS. Earned IO credits can be spent at the maximum rate of 3000 IOPS. IOPS can be of size 16–256 KB.

Once burst IO credits are exhausted due to sustain IO load, IOP or throughput are throttled down to a baseline level. Baseline IOPS is also dependent on the volume size. IO Credit model is applicable to GP2 volumes <= 1TB only. Volume larger than 1 TB does not require burst considering the baseline is already 3000 IOP. Maximum baseline of 10,000 IOPS is possible with 3.3 TB GP2 volume.

Difference between GP2 and GP3 Volume Types:

Unlike GP2 volumes where performance scales with the volume size, GP3 volume’s performance can be improved independent of volume size. GP3 volume IOPS and throughput are not constraint by volume size.

GP3 volume has 4 times higher throughput of 1000 MB/s (1GB/s) as compared to GP2 that max out at 250 MB/s.

Smaller GP3 volume can achieve much higher IOPS and throughput as compared to GP2 by paying additional fee.

GP3 may be a better alternative to EBS ST1/SC1 (Hard disk) considering ST1/SC1 are constrained by lOPS and thus difficult to achieve higher tput unless larger block sizes are used and data is accessed sequentially.

GP3 volumes deliver a consistent 3000 IOPS and 125 MB/s performance without paying any additional fee. Thus when no additional IOPS and throughput are provisioned with GP3 volume, it acts similar to GP2 volume.

There is a IOPS and throughput limits on a per volume and per instance levels and that means if your workload require more IOPS and throughput that is offered by a single volumes, then consider stripping (raid 0) across multiple volumes to achieve higher IOPS and throughput up to instance level tput limit.

For example, m4.4xlarge instance can support maximum of 16000 IOPS and 250 MB/s of throughput. Each GP2 volume can provide 3000–10000 IOPS and maximum throughput of 160 MB/s. To achieve 250 MB/s or 16000 IOPS, one can provision two EBS volumes and configure them as RAID-0.

Caution: Smaller instance types may not achieve maximum throughput or IOPS offered by a single GP2 volume. For example, m4.xlarge allow maximum IO throughput of 93 MB/s or 6000 IOPS to GP2 volume

For 100 GB GP2 volume, it takes 90 minutes to fill the IO bucket once it is empty and offers 3000 maximum IOPS of sizes 16–256 KB. 1 TB GP2 volume can be filled in 40 minutes.

To achieve maximum IOPS supported by volume, application should use smaller IO size. To achieve maximum throughput, application should use larger IO size.

EBS ST1-SC1 Volumes

EBS ST1-SC1 volumes are designed for bursty streaming type (sequential IO pattern) of workloads. Unlike general purpose (GP2-GP3) volumes that offer higher IOPS and lower latencies, ST1/SC1 EBS volumes are backed by magnetic disk and optimized for higher throughput. For ST1/SC1, IO credit model is based on MB/s .

ST1-SC1 volumes offer higher throughput (500 MB/s) as compared to GP2 volumes (160 MB/s).

When burst duration is exhausted, ST1/SC1 volume tput is dropped to baseline level. Difference between ST1 and SC1 EBS storage is that ST1 offers higher IO base/burst combination than SC1

IO Bucket fill time is the same for all buckets considering both bucket size and fill rate increases linearly.
Fill Time = Max IO credit in Bucket / fill rate.
500 G: 500 GB/20 MB/s = 436 minutes
1 TB: 1 TB / 40 MB/s = 436 minutes
12 TB: 12 TB / (40 x 12) MB/s = 436 minutes

Breakdown of ST1 volume’s base, and burst throughput and duration

EFS File System:
EFS (Elastic File System) is a managed file system service from Amazon that complies with NFSv4 protocol and standards. EFS offers shared storage that can be accessed by hundreds or even thousands of EC2 instances concurrently spanning across multiple Availability Zones (AZ) within an AWS region.

Unlike local filesystems, xfs, ext4,.. that can only be mounted on a single server, EFS is distributed file system that can span across an unconstrained number of storage servers, enabling it to grow elastically to petabyte-scale and allow massively parallel access from Amazon EC2 instances to a shared data.

This distributed data storage design means applications requiring concurrent access to same data from multiple EC2 instances can able to drive substantial levels of aggregate throughput and IOPS with a higher levels of availability and durability.

EFS also uses a credit system to determine the burst throughput and duration. Throughput scales with the size of data stored on EFS file system. New file system with no data stored gets an initial credit balance of 2.1 TB that offers burst throughput of 100 MB/s for duration of 6.1 hours. Once initial burst credits are exhausted, credit are earned at a rate of how much data is being transferred to EFS file system.

Caution: Distributed architecture and replication across AZ supported by EFS may result in relatively higher latency for attribute intensive file operation when compare to local filesystems. EFS sweet spot is: Read / Write operations in large block size (1- 4 MB) performed sequentially.

For example, EFS file system with 100 GB of stored data gets a burst credit that is sufficient to achieve 100 MB/s of IO throughput for 72 minutes each day. When no burst credits available, throughput is throttled to 5 MB/s.

Baseline rate is 50 MB/s per TB of storage used or 50 KB/s per GB of storage used. In other words, 100 GB EFS file system can burst at 100 MB/s for 5% of time if stays inactive for remaining 95% in a 24 hour period.

Caution: EFS uses aggregated throughput across all NFS clients accessing the same EFS file system when calculating 100 MB/s throughput. That means if 10 NFS clients are concurrently transferring files to the same EFS file system, each one will get 10 MB/s of IO throughput.

Unlike EBS dedicated network link used by GP2 and ST1/SC1 storage, EFS uses production network for data transfer. Thus max tput an instance can achieve on EFS file system depends on the instance network tput limit.

Originally published at http://techblog.cloudperf.net.