Serverless Aggregations on AWS Lambda with Node.js Codebase
The Simple Aggregation Use Case
Whenever some upstream server flushes its daily summaries into a CSV file the data has to be imported into a kind of database.
Eventually all data for a past year has to be aggregated and unloaded into a new daily CSV file.
The AWS Lambda
AWS Lambda offers great interface to develop policies. Once events picture created some contexts might have some complexity close to a huge data flow. Data appearance might also have couple policies. Such picture might require complex hosting schemas. Serverless becomes popular due to zero time setup. Event-driven paradigm makes them even better and allows to build discipline around events and policies. Policy becomes a unit so might be tested, debugged and monitored.
Here is a diagram for an import scenario based on AWS Lambda and Amazon S3 file events.
S3 object create is an event which has at lease one policy. An AWS Lambda might be attached directly to an S3 bucket as an event handler or be subscribed to a SNS topic. There are a couple more supported events. The Pub/Sub way allows to attach more than one event handler so there are couple of actions might be processed on a single event. While a single event handler might be bloated and have a couple of responsibilities. Let’s start with S3 event.
Same to CouldWatch Scheduler produces periodical events. Opposite to imperative style an event might be reproduced with Pub/Sub service so couple of handlers may interpret an event in their context. It might be used to invert some dependencies.
AWS Lambdas may use Private VPC networks and IAM Roles so security management becomes even easier. IAM Roles are similar to users except they have no credentials (password or access keys) associated with it. Instead of a login process a Role may be assumed by a User or a Service. Redshift assumes role to read and write Amazon S3 Objects. AWS Lambda assumes role to have abilities to setup network interfaces and do actions inside a VPC. Full role-based control.
Solution might be implemented both on a private VPC subnet(s) or having attached a public network interface. This depends on how AWS lambda has to use internal end external resources. E.g. a private VPC setup allows to interact with corporate network servers via VPN connection.
Private subnet has to have an Amazon S3 Endpoint so it stays private except broad access to the S3. If there business need to setup a web crawling or have an IP identity for some protected Internet resource then NAT interface might be installed for a set of VPC subnets. VPC also allows to utilize availability zones transparently so it improves resilience.
Serverless framework offers the best user experience to develop and deploy AWS Lambda infrastructure with Node.js codebase. There are great documentation and massive support on it. Solution might be customize with chunks of CloudFormation config.
Following instructions are based on a framework setup with a Node.js v6.10 programming model.
Here is an implementation for defined use case. Check README for setup instructions.