Thinking Serverless- Slash your AWS EC2 bills by over 50%

Yasir Ali
Yasir Ali
Dec 18, 2018 · 3 min read

Notes from the Field

Summary

Serverless architectures are application designs that incorporate third-party “Backend as a Service” (BaaS) services, and/or that include custom code run in managed, ephemeral containers on a “Functions as a Service” (FaaS) platform. Essentially, your data heavy-lifting is done via virtual services that can start up as needed and then shut down.

Data Environment on the Cloud

A typical Data Analytics/Data Lake/Data Warehouse will roughly have the following Architecture.

This provides a fairly scalable and low Total Cost of Ownership (TCO), especially for organizations that migrated from on-premise ‘old-school’ Enterprise architecture to the cloud.

However in working with clients we have consistently seen that the EC2 costs borne by organizations constitutes a large percentage of their cloud costs. Although high performance computing (such as EC2) is never cheap but organizations have generally following options to reduce cost:

Ridiculously inexpensive because there’s no commitment from the AWS side.

The Reserved and Spot are fairly good options for non-critical jobs that can wait. For production environments these options typically do not work.

Why Serverless?

Most Analytics related data transformations and computing is done via scheduled or pre-defined jobs. This typically involves ingesting raw data and running it through Python/Ruby/R etc. type code or through pre-compiled algorithms written in C++/Java. Uptime needed for Data Transformations for EC2 is never 24/7. This is the scenario where serverless computing or databases via Glue/DynamoDB/Lamda etc. should be the preferred choice.

Setup

There needs to be some rethink of the scripting setup. Cataloguing could be an annoying step but will pay dividends in the long run to keep a handle on your data transfers. Upkeep &maintenance is not that much more onerous than typical always-on computing solutions. Cost savings are immense. Depending on the size of the data being processed, we have seen EC2 costs decrease 30–60%.

Illustration of a Serverless ETL

Serverless computing obviously does not work in all scenarios and can be an aspect of your overall infrastructure. Some of our clients use EC2 on-demand, Reserved, Spot, DynamoDB, MongoDB and Glue at the same time depending on their business needs.

yali@predictdata.io

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade