Living the Golden Rule (of Cloud Architecture) #4: Fractional CPUs & the AWS T-Series Instances

Published in

Software Architecture in the Clouds

6 min readJul 1, 2019

Suppose you’re trying to follow my Golden Rule of Cloud Architecture and avoid paying for unused capacity, but you’ve got a workload that isn’t large enough to keep an instance with a single CPU busy full-time. You could try an approach like using Functions-as-a-Service, but if you wanted to stick with using a “normal” compute instance, what options do you have?

If you’re running on AWS, one of the options you have is to use a T-series instance (T2, T3, or T3a). AWS calls these “burstable general-purpose instances,” but effectively what they allow you to do is pay for just a fraction of a CPU (based on CPU time) — which can be quite useful in cases where you don’t have enough of a workload to warrant paying for a “full” CPU. There is a bit more to the T-series instances than that, however. If all they offered was the possibility to pay for just 40% of the CPU, for example, then they would be modestly interesting as a potential cost-savings tool. But their “bursting” ability, as AWS calls it, makes them (potentially) quite a bit more interesting — and useful.

The full details of how the T-series instances work is a bit byzantine, and if you look at the EC2 Instance Types page, it is easy to just lump them in with all of the other “normal” instance types. There are some important differences though, which I’ll try to describe here.

With the T-series instances, you get guaranteed access to a fixed, “baseline” percentage of the CPU time. The baseline capacity is use-it-or-lose-it: if your process is sitting idle and doesn’t use your full baseline allocation, it goes to waste. The cheaper the instance type, the smaller the baseline percentage you get, starting out at just 5% for the smallest T-type instance, the tX(2/3/3a).nano. So if you choose a t2.nano instance, for example, you’re buying a guarantee of at least 5% of that CPU’s time — and at a much lower price than you’d pay for a whole CPU (just $0.0058 per hour for a t2.nano in us-east-1). The amount of RAM you get also varies, starting at 512 MB for the tX.nano, 1 GB for the tX.micro, and so on. So you can vary both the amount of CPU and RAM that you’re paying for.

Again, if that’s all there was to it, it would be modestly interesting. The problem is that you rarely have a workload that equates to a steady, constant percentage of a CPU — workloads tend to vary over time, and even if you can buy just X% of a CPU, it can be tough to precisely use exactly that X% of the CPU continually.

That’s where the T-series instances start to become more interesting, because in addition to the fixed baseline capacity, you also get a certain number of “credits” that give you access to additional CPU time above and beyond your baseline. And that’s where it gets byzantine. But the basic idea is that you earn “credits” over the course of the day for additional CPU time, and you can use them as needed. You could potentially use all of your available credits in a single sustained burst of 100% CPU utilization, or you could do a bunch of smaller bursts of 50–60% of CPU utilization spread out over the day, or you could use a steady percentage of CPU utilization that uses credits at the same rate that you earn them, effectively giving you access to a continuous CPU capacity above the baseline — so you might have a baseline of 10%, but you could use a total of 20% of the CPU continuously.

If you run out of credits — for example, if you pin the CPU at 100% for a long enough time — then you are throttled to whatever your baseline percentage is, until you earn more credits. There is a maximum number of credits you can earn (which is equal to the number of credits you’d accrue in a single day), but you can store up to that maximum amount indefinitely — you don’t lose the credits below the maximum if you don’t use them.

As I said, it is a bit byzantine. But let’s work through an example using the t2.micro for illustration. The baseline CPU performance you get is 10%, so that’s what you have access to always, use-it-or-lose-it. You earn 6 credits per hour (accrued per-millisecond), for a total of 144 credits per day (which is also the maximum amount that you can save-up with the t2.micro). Each credit gives you access to 100 percent-minutes of compute time. The “percent-minute” is an unusual unit, but it basically means that for one credit (100 percent-minutes) you can use 100% of the CPU for one minute, or 1% of the CPU for 100 minutes, etc. So you have access to an additional 14400 percent-minutes of CPU time per day, which works out to a continuous 10% additional CPU utilization above the baseline. Effectively, with the t2.micro you could have a continuous 20% CPU utilization all day long. If you wanted to max-out the CPU by adding an additional 90% utilization to your 10% baseline, you could have 160 minutes of continuous 100% utilization before you ran out of credits and got throttled back to your baseline of 10%.

Of course all of this is in theory. We’re talking about shared instances here, and there may be other users vying for compute time on the same physical host that you’re on. So there’s no real guarantees that you can “burst” at any given time and max out the CPU, but then again that probably doesn’t matter to you. What may matter to you more is that responsiveness may vary — you may not be able to grab the amount of CPU you need exactly when you need it, which could be a problem if response time is a sensitive issue for you.

But what you definitely do get is some flexibility in making use of the CPU. In the t2.micro example, you’re basically getting 20% of the CPU but with some degree of flexibility in when you use it — which helps to deal with time-variant workloads. You’d want to use at least 10% of the CPU on a continual basis, but you have flexibility in using the remainder. As long as you average out to 20% utilization over the course of a day, you’ve maxed-out your allocation — which is great. If you don’t manage to use your full allocation, that isn’t too bad — you can potentially bank some credits for future usage, and if you do waste some of your baseline capacity, at least it isn’t costing you that much.

You have some additional options when using the T-series as well. There are (currently) three different classes of T-series: T2, T3, and T3a. The T2 series was the original “burstable” class. The newer T3 series features “the Intel Xeon Platinum 8000 series (Skylake-SP) processor with a sustained all core Turbo CPU clock speed of up to 3.1 GHz, and deliver up to 30% improvement in price performance compared to T2 instances” (according to the AWS EC2 T3 Instances page), while the T3a instances “feature the AMD EPYC 7000 series processor with an all core turbo clock speed of up to 2.5 GHz… [offering] up to 10% savings.” So you have some options in price/performance that you can evaluate to find the best fit for your needs.

Additionally, you have an option to use “unlimited” mode, which basically allows you to pay extra in order to use CPU capacity beyond your baseline + credit allocation, as needed. So if you occasionally really need more than your normal allocation, you have the flexibility to get it rather than being throttled.

The T-series definitely isn’t suitable for all types of workloads — don’t think of it as just a cheaper alternative to M- or C-series instances, for example. But for cases where your workload doesn’t warrant a beefier instances and you want to have some flexibility in CPU utilization, the T-series can be quite useful. They can be complicated to use optimally, and you’ll likely need to do some experimentation to find the best fit to your workload. But if you make the effort, they can potentially be quite helpful in living the Golden Rule.

Living the Golden Rule (of Cloud Architecture) #4: Fractional CPUs & the AWS T-Series Instances

Written by Thomas Soares