How to get started with AWS Lake Formation?

To start with my first effort to write something related to emerging technologies around us…Yes friends I’m talking about AWS and last week I have participated in AWS online Summit and thought of to write few interesting features introduced by AWS.

“AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.”

Architecture to design Data lake for Zipcodes in New York City

So I’m going to walk you through, how to develop a AWS Lake Formation.

Step 1. Assign data lake administrator

The first step in creating your data lake in Lake formation is to define one or more administrator . Administrators have full access to the lake formation system, and control the initial data configuration and access permissions.

AWS Lake Formation in service panel

Go to “Permission” at bottom left of the panel ,after clicking into AWS Lake Formation.

Add administrator in admin and database creator
Administrator has been updated successfully in AWS Lake formation

Step 2. Now, its time to setup the Data lake and it will be created in 3 stages.

Stage 1. Register your Amazon S3 Storage

Choose your S3 bucket to setup the Data lake

Step 3. Once the S3 path is registered, we need to develop a database where all the objects will be stored in specific format (Athena, RDS, Redshift, etc.)

Stage 2. Create a Database for data cataloging

Step 4. Now as per the design , the database is being created through the AWS Glue catalog, so need to grant the permissions for Glue.

Zipcode-db is a database which has AWSGlueServiceRole persmission

Step 5. Create and Run AWS Glue Crawler to load the data into Zipcode-db

After a successful run of crawler a zipcode table has been populated

Step 6. Once table is there , to read this we need permission.

We can also restrict the permission based on users (Data Analyst, Data Scientist or a Business Analyst), this resolves lots of concerns regarding cost management (restricted column will be accessed), security views, prevent from malicious attacks or data thefts.

Two columns have been granted to have a read for a data analyst

Conclusion:

Amazon S3 is the fundamental for data lakes. We can privatize our data lake , encrypt everything, and secure specific access (Data Analyst, Data Scientist, Data Engineer, etc.) to and from that data lake. This improves the performance by parallelization of access and scale horizontally. And also, this architecture can be leveraged to improve data governance, data management, and efficiency.

References:

  1. Data Lake Formation — https://aws.amazon.com/lake-formation/
  2. AWS Summit Online Takeaways

--

--

--

You can have data without information, but you cannot have information without data — Technical Lead at Lumiq.ai ( AWS, GCP , Azure, & Snowflake)

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Don’t Stop Thinking about Tomorrow….

Six-months as part of the Umbraco .NET Core Transition team

How to build an event driven application on Google Cloud using cloud functions

The architecture, showing what cloud functions are triggered by what events

Pixelate me: creating a telegram bot to make beautiful pixelated profile pictures

Gate.io ALPA Celebration, 5 Prizes/5 Chances Share $15,000 ALPA Announcement

GO Time Load Location Single Initialize

Incentives for ONI Token Holders

Leading from Home. The New World for Engineering Managers.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rajesh Kumar

Rajesh Kumar

You can have data without information, but you cannot have information without data — Technical Lead at Lumiq.ai ( AWS, GCP , Azure, & Snowflake)

More from Medium

Using Athena Views As A Source In Glue

AWS — GETTING STARTED WITH FREE ACCOUNT

Getting started with AWS Step Functions 01

How to transfer data with AWS DataSync the easy way