How to get started with AWS Lake Formation?
To start with my first effort to write something related to emerging technologies around us…Yes friends I’m talking about AWS and last week I have participated in AWS online Summit and thought of to write few interesting features introduced by AWS.
“AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.”

So I’m going to walk you through, how to develop a AWS Lake Formation.
Step 1. Assign data lake administrator
The first step in creating your data lake in Lake formation is to define one or more administrator . Administrators have full access to the lake formation system, and control the initial data configuration and access permissions.

Go to “Permission” at bottom left of the panel ,after clicking into AWS Lake Formation.



Step 2. Now, its time to setup the Data lake and it will be created in 3 stages.

Stage 1. Register your Amazon S3 Storage

Step 3. Once the S3 path is registered, we need to develop a database where all the objects will be stored in specific format (Athena, RDS, Redshift, etc.)
Stage 2. Create a Database for data cataloging

Step 4. Now as per the design , the database is being created through the AWS Glue catalog, so need to grant the permissions for Glue.


Step 5. Create and Run AWS Glue Crawler to load the data into Zipcode-db


Step 6. Once table is there , to read this we need permission.


We can also restrict the permission based on users (Data Analyst, Data Scientist or a Business Analyst), this resolves lots of concerns regarding cost management (restricted column will be accessed), security views, prevent from malicious attacks or data thefts.


Conclusion:
Amazon S3 is the fundamental for data lakes. We can privatize our data lake , encrypt everything, and secure specific access (Data Analyst, Data Scientist, Data Engineer, etc.) to and from that data lake. This improves the performance by parallelization of access and scale horizontally. And also, this architecture can be leveraged to improve data governance, data management, and efficiency.
References:
- Data Lake Formation — https://aws.amazon.com/lake-formation/
- AWS Summit Online Takeaways