Cutting Cloud Costs with AWS Fargate Spot

Vincent Van Gestel
VRT Digital Products
6 min readDec 15, 2022

At VRT, we recently migrated several microservices to a container based solution on Amazon ECS (you can read more about this here). Performing a one-to-one resource migration (CPU, memory and instance count) can however lead to a substantial bill, as Fargate pricing lies significantly above on-demand EC2 pricing. The intended approach to limit costs is to reduce the container resources footprint and scale more aggressively. Additionally, if the context allows for it, one can also cut costs dramatically by switching instance type. This blog post details how we save over a thousand dollars per month using this simple configuration change.

What are spot instances?

When spinning up an EC2 instance or ECS task, you can choose between different levels of commitment for that instance. The default type is “on-demand”, which equals no commitment. When launching an on-demand instance, it will run with the desired specifications (instance type or CPU/Memory configuration) for as long as the user wants, accruing costs every second based on the current on-demand rates (these tend to be fairly stable) for that instance type. Working with on-demand instances is easy as there is no need for any advance planning, you simply launch what you need, when you need it. As expected, the on-demand pricing will be the highest rate available, as it gives the user the most flexibility.

Increasing commitment is one way to reduce costs. Buying instance hours in advance (known as reserved instances) requires you to also plan in advance. Poor planning can thus easily reduce any gains from the improved rates.

One can also go the opposite route and allow for even less commitment. Introducing spot instances, the excess of the AWS hardware. The primary goal of a cloud provider running as many expensive machines as AWS has, is to keep these machines as busy as possible, at all times. Unused resources from AWS’s perspective is a pure loss. In order to limit this loss, AWS puts these excess resources on the market at a massive discount. These discounted instances are known as spot instances. They behave just like any other instance, but can be reclaimed by Amazon at any time (with a 2 minute warning). To give an example of potential savings, at the time of writing, the pricing per CPU/memory of a regular Fargate task versus a spot Fargate task (eu-west-1 region) can be found in the following table (note that this pricing will fluctuate over time up to a maximum of 70% savings, you can find the current pricing for your region on the AWS pricing page).

+--------------------------+-------------------------------+-----------+
| Fargate vCPU pricing | Fargate Spot vCPU pricing | Savings % |
+--------------------------+-------------------------------+-----------+
| $0.04048 | $0.01265462 | 31% |
+--------------------------+-------------------------------+-----------+
| Fargate Mem (GB) pricing | Fargate Spot Mem (GB) pricing | Savings % |
+--------------------------+-------------------------------+-----------+
| $0.004445 | $0.00138957 | 31% |
+--------------------------+-------------------------------+-----------+

Spot instances for EC2 and Fargate are also slightly different. In the case of EC2, it is governed in a supply and demand market. You can specify how much you’re willing to pay for a spot instance and if there is capacity available that matches your requirements, then you get the spot. Fargate currently doesn’t have this “bidding” system, which makes it easier to grasp. A common advice is to not use spot instances in production environments, as there is the possibility that you can get no instances for your specified constraints, especially if they are quite strict. In our experience, this potential shortage is not problematic on Fargate.

How to use spot instances?

In order to utilize spot instances in your ECS cluster, you will need to enable them as a potential provider. This is done in cloudformation by adding the following snippet of code:

"ECSCluster": {
"Type": "AWS::ECS::Cluster",
"Properties": {
"CapacityProviders": [
"FARGATE",
"FARGATE_SPOT"
],
[…]
}
}

You can now launch tasks on either the regular Fargate instances, or on a Fargate spot instance.

To determine which type to use, ECS uses a system of weights. The ratio between the weights and the ratio of currently running tasks will determine the next used type. Consider for example the following weights:

"ECSService": {
"Type": "AWS::ECS::Service",
"Properties": {
"CapacityProviderStrategy": [
{
"Base": 0,
"CapacityProvider": "FARGATE",
"Weight": 2
},
{
"Base": 0,
"CapacityProvider": "FARGATE_SPOT",
"Weight": 3
}
],
[…]
}
}

When specifying a ratio of 2 to 3, it means that for every 2 regular tasks, 3 spot tasks can be launched. With 5 desired tasks, we would be running 2 regular and 3 on spot instances. Setting a weight to 0 for a service, would result in only using the other provider. At least one provider must have a weight set greater than 0. In order to maintain a minimum number of tasks of a specific provider, one can set a base value. This base should be satisfied first, any additional tasks will be split according to the desired weight ratio. For example, if we would have specified a base of 2 for our regular Fargate provider and our desired task count was 7, then we would be running 4 regular Fargate tasks and 3 spot tasks.

Our experience with spot instances

There are many warnings in the AWS documentation regarding sudden spot instance termination and the possibility of running out of spot capacity. This warrants its usage to be limited to prevent any critical service failure or unavailability. As such, it is very important to identify where and how to best utilize them. Any service that cannot handle sudden termination (usually the case for stateful services, which honestly should be avoided in modern microservices architectures anyways), is likely not suited for using spot instances. Luckily our first use case, the microservices managing our audio workflows, can handle termination.

Our services are spread over three different environments: a development, a staging and finally a production environment. Given that outages by spot instance replacements are relatively rare and localized to a specific microservice, we decided to maximize our savings in our development and staging environments by running purely on spot instances. After checking in with our development team after 3 months of using spot instances, they claimed to have not even noticed the switch. At the same time our costs did notice the switch, as we dropped several hundreds of dollars per month (we already terminate running instances in these environments outside of working hours to reduce costs, otherwise the impact would have been even larger).

In our production environment we settled on running half of the tasks on spot instances. For most of the microservices in this setup, any replicas were primarily used to guarantee high availability and not for any significant performance requirements, as such, keeping these “backups” on spot instances ensures their monetary impact remains limited. Below you can also see a graph visualizing every spot instance replacement event, spread across the different services. While this is of course fairly volatile in nature, we currently experience less than 1 replacement per service per day, which is completely acceptable.

Spot replacements by service over 1 week time

Conclusion

When you’re running many services, the costs of ECS (and especially Fargate) can ramp up quite significantly. If your systems can scale and handle terminations well, then utilizing spot instances can meaningfully reduce your AWS bill. Make sure to set a base value for a non-spot provider to avoid any unnecessary downtime. If downtime is less critical, for example in a development environment, then you can even entertain the idea of running only on spot instances to minimize instance costs. We currently only experience about 1 replacement per service per day which, depending on its utilization in a development or staging environment, can go by completely unnoticed.

--

--