Real-life AWS infrastructure cost optimization strategy

How we keep a good Amazon Web Services billing hygiene at Teads

Damien Pacaud
Oct 30, 2017 · 9 min read

AWS recently announced its new per second billing for its EC2 instances and EBS volumes. This is perfect timing to talk about cost optimization. After a short intro we will guide you through some real world examples and best practices that we use at Teads to optimize our infrastructure costs.

The cloud computing opportunity and its traps

One of the advantages of cloud computing is its ability to fit the infrastructure to your needs, you only pay for what you really use. That is how most hyper growth startups have managed their incredible ascents.

Most companies migrating to the cloud embrace the “lift & shift” strategy, replicating what was once on premises.

You most likely won’t save a penny with this first step.

Main reasons being:

Once in the cloud, you need to “cloudify” your infrastructure.

Then, and only then, will you have access to virtually infinite computing power and storage.

Watch out, this apparent freedom can lead to very serious drifts: over provisioning, under optimizing your code or even forgetting to “turn off the lights” by letting that small PoC run more than necessary using that very nice r3.8xlarge instance.

Essentially, you have just replaced your need for capacity planning by a need for cost monitoring and optimization.

The dark side of cloud computing

At Teads we were “born in the cloud” and we are very happy about it.

One of our biggest pain today with our cloud providers is the complexity of their pricing.

It is designed to look very simple at the first glance (usually based on simple metrics like $/GB/month or $/hour or, more recently, $/second) but as you expand and go into a multi-region infrastructure mixing lots of products, you will have a hard time tracking the ever-growing cost of your cloud infrastructure.

For example, the cost of putting a file on S3 and serving it from there includes four different lines of billing:

Our take on Cost Optimization

The limit of cost optimization for us is when it drives more complexity in the code and less agility in the future, for a limited ROI.
This way of thinking also helps us to tackle cost optimisation in our day to day developments.

Overall we can extend this famous quote from Kent Beck:

“Make it work, make it right, make it fast” … and then cost efficient.

Billing Hygiene

It is of the utmost importance to keep a strict billing hygiene and know your daily spends.

In some cases, it will help you identify suspicious uptrends, like a service stuck in a loop and writing a huge volume of logs to S3 or a developer that left its test infrastructure up & running during a week-end.

You need to arm yourself with a detailed monitoring of your costs and spend time looking at it every day.

You have several options to do so, starting with AWS’s own tools:

Trusted Advisor cost section - Courtesy of AWS
Example of a Cost Explorer report — AWS documentation

Then you have several other external options to monitor the costs of your infrastructure:

Building our own monitoring solution

At Teads we decided to go DIY using the detailed billings files.

We built a small Lambda function that ingests the detailed billing file into Redshift every day. This tool helps us slice and dice our data along numerous dimensions to dive deeper into our costs. We also use it to spot suspicious usage uptrends, down to the service level.

This is an example of our daily dashboard built with chart.io, each color corresponds to a service we tagged
When zoomed on a specific service, we can quickly figure out what is expensive

On top of that, we still use a spreadsheet to integrate the reservation upfronts in order to get a complete overview and the full daily costs.

Now that we have the data, how to optimize?

Here are the 5 pillars of our cost optimization strategy.

1 - Reserved Instances (RIs)

First things first, you need to reserve your instances. Technically speaking, RIs will only make sure that you have access to the reserved resources.

At Teads our reservation strategy is based on bi-annual reservation batches and we are also evaluating higher frequencies (3 to 4 batches per year).

The right frequency should be determined by the best compromise between flexibility (handling growth, having leaner financial streams) and the ability to manage the reservations efficiently.
In the end, managing reservations is a time consuming task.

Reservation is mostly a financial tool, you commit to pay for resources during 1 or 3 years and get a discount over the on-demand price:

2 - Optimize Amazon S3

The second source of optimization is the object management on S3. Storage is cheap and infinite, but it is not a valid reason to keep all your data there forever. Many companies do not clean their data on S3, even though several trivial mechanisms could be used:

The Object Lifecycle option enables you to set simple rules for objects in a bucket :

Also, don’t forget to set up a delete policy when you think you won’t need those files anymore.

Finally, enabling a VPC Endpoint for your Amazon S3 buckets will suppress the data transfer costs between Amazon S3 and your instances.

3 - Leverage the Spot market

Spot instances enables you to use AWS’s spare computing power at a heavily discounted price. This can be very interesting depending on your workloads.

Spot instances are bought using some sort of auction model, if your bid is above the spot market rate you will get the instance and only pay the market price. However these instances can be reclaimed if the market price exceeds your bid.

At Teads, we usually bid the on-demand price to be sure that we can get the instance. We only pay the “market” rate which gives us a rebate up to 90%.

It is worth noting that:

4 - Data transfer

Back in the physical world, you were used to pay for the network link between your Data Center and the Internet.

Whatever data you sent through that link was free of charge.

In the cloud, data transfer can grow to become really expensive.

You are charged for data transfer from your services to the Internet but also in-between AWS Availability Zones.

This can quickly become an issue when using distributed systems like Kafka and Cassandra that need to be deployed in different zones to be highly available and constantly exchange over the network.

Some advice:

5 - Unused infrastructure

With a growing infrastructure, you can rapidly forget to turn off unused and idle things:

Trusted Advisor can help you in detecting these unnecessary expenses.

Key takeaways

Thank you for reading. This article was inspired by the talks I made during the #2 AWS Montpellier Meetup and Devops D-Day conference.

Devops D-Day 2017 — Marseille

If you like working on big cloud infrastructures and growth challenges, feel free to contact us, we are constantly looking for great teammates.

If you want to know more about Engineering at Teads:

Teads Engineering

150+ innovators building the future of digital advertising

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store