My goal was to run ETL on objects already existing in a S3 bucket (as opposed to running ETL on-the-fly for new objects uploaded to the bucket). Since Lambdas cannot be triggered by objects that already exist in the bucket (S3 events are only generated by newly uploaded objects), leveraging the S3 + Lambda integration was not an option.

For step number 6, I did not consider using another SQS queue and Lambda function. Maybe you can explain how an extra queue and Lambda would have helped in step 6? This might lead to an interesting optimization strategy!

