AWS Knowledge Series: Batch Processing using AWS Lambda
In this article we will talk about how you can leverage AWS CloudWatch Rules and AWS Lambda to build a reliable, flexible, low cost and server-less batch processing. Typically batch processing involves running a scheduling engine and AWS EC2 instances. AWS Batch provides a nice abstraction and allows you to do batch processing at scale. However — It is not entirely “server-less”. AWS batch spins up EC2 instances to run your batch jobs. Most of the time you may have to leave a minimum capacity of EC2 in running state even if you are not doing any batch processing. This effectively means you are spending money even when you are not doing any work. So AWS Batch is not truly server less. What I will demonstrate today is how to do batch processing using serverless technologies like AWS Lambda. We will use AWS CloudWatch rules as our scheduling engine.
Batch Use Case
In my previous article about SES, I spoke about how you can use SES and AWS Lambda to send periodic emails to your user population.
AWS Knowledge Series: Simple Email Service
AWS Simple email service (SES) is Amazon’s answer to services like Mailchimp that allows you to efficiently manage your…
We will look at some code in this article and see how batch processing with AWS Lambda can be used for this purpose. So the problem statement is as follows:
- You have a large user population to whom you want to send weekly / monthly emails
- You want to personalise the emails being sent with user specific data so before you send out an email, you need to fetch user specific data that will be used to personalise the email message
- For simplicity we will assume that users have numeric identifier starting from 1 to N where N is a large enough number (> 100K).
High Level Approach
The following diagram shows a very high level overview of how the batch processing is implemented using AWS Lambda.
The batch execution is scheduled via CloudWatch rule. To create a new rule open AWS Console and navigate to CloudWatch. Select Rules from the left panel and hot Create Rule. The following screen will be shown.
Batch Worker Creation Lambda Function
The following listing gives you gist of what the batch worker creation lambda function should be doing.
Batch Lambda Function
The following listing gives you gist of what the batch lambda function should be doing.
So as you can see by just implementing two Lambda functions we can create a configurable, scalable batch processing “engine”. With this approach there is no need to spin up costly EC2 instances.
Few “Gotchas” you should be aware of
- Test — Test — Test — I can not stress this enough. You must ensure that all your “worker chains” are reaching end of execution irrespective of the kind of errors it encounters during execution.
- Capture and analyse errors encountered resulting in “unprocessed” items to understand how you can make the execution more robust.
- Test with different number of concurrent worker chains. More worker chains means that your batch processing will finish faster but will also lead to increased instances of throttling exceptions and other errors as well as impact to your production traffic. You have to ensure that even at the busiest hour of your production environment, you batch does not fail. Also the batch execution should not negatively impact the functioning production system.
- Make sure that the “Retry attempt” under the asynchronous invocation for the “BatchLambda” is set to zero. This is a very important setting. The default value for this is 2. So for some unhandled exception if the “BatchLambda” fails — It will be executed two more times! You don’t want that happening at all as it will do the same work 3 times e.g. send same email to same user thrice!
- Test the edge cases about worker chain ending properly especially if you are not using integer indexes as shown above. A wrong edge case condition will result in either the worker chain not ending at all or not processing the entire chain, both of which are bad.
- One of the scariest issue with this approach is having a “runaway” worker chain that refuses to end due to wrongly programmed edge condition (e.g. an infinite loop in the worker chain). In such cases the execution of “BatchLambda” will just keep happening. This combined with retry attempt set to default value is a recipe for “high AWS bill”! So what’s the solution? Well the answer is simple — Comment out the “BatchLanbda” asynchronous invocation towards the end of the function and re-deploy the function. In a few minutes after deployment sanity will return and all executions will stop.
That’s it for this article. Let me know if you have an alternate approach for batch execution using AWS Lambda.