Data Analysis using AWS Glue/ Athena/ Quick sight

Prachi Khanna
3 min readAug 3, 2020

--

Technologies used:

AWS Glue: AWS Service that Extracts, Transforms and Loads data and make it easier to move across other AWS services.

AWS Glue Crawler: It can inspect the input data and create a schema/tables out of the dataset. A Crawler can be setup to run on demand or can be scheduled.

AWS Athena: Athena can access the tables created by the Crawler and can be used to run DDL. Athena can also connect to S3 directly and can create tables from the JSON/Parquet layout/CSV/Apache ORC.

QuickSight: In simple words its a tool for visualizing the data and share the dashboard/ stories to users.Athena integrates with Amazon QuickSight for easy data visualization.

Lets get started with the setup:

  1. Goto AWS Glue Console and create a crawler and provide input bucket/folder name

1a. Create IAM role to be used by crawler

1b. Once crawler in created, run the crawler which in turn will create table in Glue Data Catalogue. These tables can be access by AWS services like Athena/ Redshift Spectrum for further analysis

2) Goto AWS Athena to query the tables or to create views out of tables.

3) Goto AWS Quick sight

3a) Provide access to Quick sight to S3/Athena. You need to include the S3 bucket name while granting access else Quicksight will always give 403 error.

3b) Add AWS Athena and provide database/table name.

3c) Import the data to SPICE, which act as a local cache to Quick sight and will help in faster response.

3d) Once the data is imported, it is a matter of drag/drop to create high standard visualizations

Advantages of using above setup for data analysis:

  1. AWS Athena/ AWS Glue is serverless and hence pay as you use.
  2. AWS Glue support writing data in parquet format which can help Athena to perform DDL statement much faster
  3. Glue jobs can be schedule to fetch the data as and when updated on S3

--

--

Prachi Khanna

Innovate & Provide Solutions. A Mother , A Cook and an Artist !!!! Cloud Savvy !!!