Cactus Tech Blog
Published in

Cactus Tech Blog

How moving test workloads to spot saved us over 50% of our AWS spends

Roughly around a year ago, we were thinking of newer approaches to help reduce our AWS costs thanks to our ever-increasing infrastructure.

We had considered and exhausted/explored all of the standard recommendations like Right-sizing, Reservations, Savings Plan, Compute Plan, but still were not extremely happy with the amount of savings we derived, or the mighty bills that we had to pay.

We saw that we’re incurring around 700$ per month for our test (non-critical) workload. Cutting this cost seemed like an easy win, that’s when I explored Spot Instances with the aim to reduce this billing.

As this is an ambitious task, involving significant changes in every Developer, QA, and Architect’s workflow there were quite some challenges and initial push-back/resistance from the entire Software Development Team. I stuck to my guns, as the cost-benefit analysis /math was sound and indicated this needed to be done because it is the right thing to do.

Why did we switch?

We knew the demand (the number of servers, Instance type, size of volumes) beforehand for our test infrastructure, By leveraging Auto-Scaling Spot Instances, we can gain the cost advantage by claiming this demand from unused compute capacity that is up for grabs at AWS. We can manage a bit of slow start/flexibility if the instance is reclaimed by AWS (as it is a test workload), In a case of Stack Re-balancing, i.e Spot Capacity being snatched from us, the Auto-Scaling Groups will initiate a new request, and maintain optimum capacity for our web applications.

Are there any other differences between spot and on-demand?

Spot Instances are exactly similar to that of On-Demand when they are in a running state. They are susceptible to Interruptions from AWS based on Bid price changes. We’ve tried to minimize these Interruptions by attaching the spot requests via Spot Automatic Scaling Groups.

We also tweaked our user-data scripts to automatically mount to persistent EBS volumes in order to sustain stack re-balancing, This can be visualized by taking a look at the below diagram

Automatic attaching to Persistent EBS volume in case of Spot Interruption.

User data snippet for mounting to persistent EBS volumes:

apt-get update && apt-get install -y awscli
INSTANCE_ID=$(/usr/bin/curl -s
EC2_AVAIL_ZONE=`curl -s`
if grep -q "1a" <<< "$EC2_AVAIL_ZONE" ; then
/usr/bin/aws ec2 attach-volume --region us-east-1 --volume-id <vol-id-subnet-a> --instance-id $INSTANCE_ID --device /dev/sdh
sleep 10
/usr/bin/aws ec2 attach-volume --region us-east-1 --volume-id <vol-id-subnet-b> --instance-id $INSTANCE_ID --device /dev/sdh
sleep 10
cd /
mkdir data
mount /dev/xvdh1 data/
cd /var/
mv www www_bak
cd /
ln -s /data/var/www /var/www
cd /var/
mv log log_bak
ln -s /data/var/log /var/log
chown -R www-data:www-data www
chown -R root:syslog log

Note: Replace <vol-id-subnet-a> and <vol-id-subnet-b> with corresponding Volume ID’s in your account.

This project was successfully implemented, and it gave us the following benefits:

  1. Around 50–60% cost savings as compared to On-Demand (refer to Cost Savings summary below)
  2. Running spot instances for 100% i.e 730 hours is cheaper than running On-Demand Instances (12 hours/day)
  3. Innovation and positive disruption
Cost Savings Summary

60% savings month on month is an impressive feat and every Cloud/DevOps Engineers fantasy/wildest dream.

Implementing this solution has taught me a lot of stuff like the emphasis on Fail-over/Redundancies, It literally put me on the spot to think of edge cases and non-invasive approaches to vet the whole infrastructure without causing major inconvenience to the Development Teams that had to endure this transition.

The below meme sums it up perfectly (posted earlier on slack on a random Friday after the completion of this project)

Transition to Spot

Stay tuned, to know more about such cool, interesting/and innovative concepts and projects, drop me a line if you need any assistance getting started with your journey towards Cloud Exploration.

Btw, we at Cactus Communications are also hiring rockstar DevOps engineers (and across many other engineering roles). Do check out our jobs page or you can apply directly via our LinkedIn page.

Welcome to the Cactus Tech community! We’re shaping the future of scholarly and medical communications with innovative solutions and cutting-edge technology. Like what we do? You can join us too!

Recommended from Medium

How to test whether Mailer is enqueued 【Ruby on Rails】

We showed out @Meilleur Dev de France 2016

Let’s do Color & Math

Timesheet, Time tracking, Project budgeting and more new improvements

Kubernetes CI/CD With Jenkins & GitHub

Boid Simulator: Coding Challenge #1

Birds flocking like boids

Chingu Weekly Vol. 90 — Project MVP submissions & new project beginnings

Using string variables properly in Python

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Omkar Kadam

Omkar Kadam

I’m a DevOps Engineer by profession, who likes to solve complex engineering problems with unconventional/out-of-the-box/ Innovative solutions.

More from Medium

Interface Endpoints Use Case

Whistle-stop tour of AWS DevOps — Part 2: Hosting

Copy Encrypted AMI across AWS Accounts

Cloud Security Posture Management with CloudGraph Api