Building Serverless Data Lake on AWS with Terraform
No servers, no problems
It’s been a long time since my last article, but finally I have something to share.
Last two months were real hell for me — a short-term project that supposed to be easy and quick ended up as always — with famous quote about programmer’s credo: “We do these things not because they are easy, but because we thought they were going to be easy”.
The project was to build a data lake from the scratch with all the freedom of actions in order to find the best solution. As it is 2021 outside and modern problems require modern solutions — serverless data lake is hell good of an idea.
Security
As we are dealing with data our first concern should be security. In order to create a secure cloud environment it is recommended to use Operational Best Practices for CIS AWS Foundations Benchmark v1.3 Level 2 and implement all of them (Perfect scenario). For one time evaluation of the solution implementation Prowler score is a good indicator (Prowler is a security tool to perform AWS security best practices assessments, audits, incident response, continuous monitoring, hardening and forensics readiness). Prowler is a really great open source tool and I recommend you to check your cloud environment with it (or you can set up AWS…