Notes from the Field
Serverless architectures are application designs that incorporate third-party “Backend as a Service” (BaaS) services, and/or that include custom code run in managed, ephemeral containers on a “Functions as a Service” (FaaS) platform. Essentially, your data heavy-lifting is done via virtual services that can start up as needed and then shut down.
Data Environment on the Cloud
A typical Data Analytics/Data Lake/Data Warehouse will roughly have the following Architecture.
This provides a fairly scalable and low Total Cost of Ownership (TCO), especially for organizations that migrated from on-premise ‘old-school’ Enterprise architecture to the cloud.
However in working with clients we have consistently seen that the EC2 costs borne by organizations constitutes a large percentage of their cloud costs. Although high performance computing (such as EC2) is never cheap but organizations have generally following options to reduce cost:
Ridiculously inexpensive because there’s no commitment from the AWS side.
The Reserved and Spot are fairly good options for non-critical jobs that can wait. For production environments these options typically do not work.
Most Analytics related data transformations and computing is done via scheduled or pre-defined jobs. This typically involves ingesting raw data and running it through Python/Ruby/R etc. type code or through pre-compiled algorithms written in C++/Java. Uptime needed for Data Transformations for EC2 is never 24/7. This is the scenario where serverless computing or databases via Glue/DynamoDB/Lamda etc. should be the preferred choice.
There needs to be some rethink of the scripting setup. Cataloguing could be an annoying step but will pay dividends in the long run to keep a handle on your data transfers. Upkeep &maintenance is not that much more onerous than typical always-on computing solutions. Cost savings are immense. Depending on the size of the data being processed, we have seen EC2 costs decrease 30–60%.
Serverless computing obviously does not work in all scenarios and can be an aspect of your overall infrastructure. Some of our clients use EC2 on-demand, Reserved, Spot, DynamoDB, MongoDB and Glue at the same time depending on their business needs.