Querying Data in S3 with Amazon Athena

Ruchir B Pandya
7 min readDec 18, 2023

In this blog we will attempt to query and analyze data stored in an S3 bucket using SQL statements. The lab aims to help users understand the process of creating a table in Amazon Athena, configuring the necessary settings, and executing SQL queries.

We will practice using Amazon Athena to create a table with a CSV file present in the S3 bucket and query using SQL statements.

What is Amazon Athena?

  • Amazon Athena is a serverless query service provided by AWS.
  • It allows users to analyze data stored in Amazon S3 using standard SQL queries.
  • Athena uses Presto, an open-source distributed SQL query engine, as its underlying technology.
  • It supports various data formats such as CSV, ORC, JSON, Avro, and Apache Parquet.
  • Users can access Athena through the AWS Management Console, ODBC/JDBC drivers, or API.
  • Athena is a pay-per-query service, meaning users are only charged for the queries they run.
  • It offers an interactive query editor within the AWS Management Console for easy data analysis.
  • Users can create tables, define schemas, and execute SQL statements to explore and query their data.
  • Athena integrates with other AWS services like AWS Glue for metadata management and data cataloging.
  • It provides quick query execution and scales automatically based on the volume and complexity of the data.

Architecture Diagram

Credits: Whizlab

Task 1: Setup workgroup

In this task, we are going to create a workgroup in Amazon Athena. The workgroup allows users to define specific configurations and settings for their query execution environment. By creating a workgroup, users can customize parameters such as query result location, encryption settings, and query execution options.

Note: As a prerequistie enure you have a S3 bucket for your account in the region you are performing this hands on activity

Navigate to Services menu in the top, then click on Athena in the Analytics section.

In the sidebar, under Administration select Workgroup and then click on Create Workgroup button.

Provide details to create a workgroup:

  • Workgroup Name: Enter your choice of name
  • Description: Enter your choice of description

Analytics engine:

  • Choose the type of engine : Select Athena SQL
  • Upgrade Query engine: Select Automatic

Query result configuration: Select the S3 bucket, that you have created… by clicking on Browse S3 button.

Leave other settings as default and click on the Create Workgroup button.

Now it will list all the Workgroups.

Task 2: Create a database in Glue

In this task, we are going to create a database in AWS Glue, a fully managed extract, transform, and load (ETL) service. The database serves as a container for organizing and storing metadata related to the data tables used in Athena. It enables efficient data cataloging and data management for the subsequent tasks.

  1. Navigate to Services menu at the top, then click on AWS Glue in the Analytics section.
  2. In the left sidebar, Under Data catalog, Click on Databases
  3. Click on the Add database button

In the pop-up menu, enter the database name of your choice and click on the Create database button.

The database is now created.

Task 3: Create a table in Glue

In this task, we are going to create a table in AWS Glue, which represents the structure and schema of the data stored in the S3 bucket.

In the left sidebar, Under Data catalog, Click on Tables.

To create a table, click on the Add table button

In the Set table properties section, do the following:

  • Enter the Table name as whiz-sample-table *(Your choice of name)
  • Database: select whizgluedatabase *(Your created database)

In the Data store section, do the following:

  • Select the type of source: S3 (default)
  • Data location is specified in: my account (default)
  • Include path: Select the S3 bucket name starting with your S3 bucket
    Make sure you add / in the end.

In the Choose a data format section, do the following:\

  • Select Classification as CSV
  • Choose delimiter as Comma: ,
  • Click on the Next button.

In the Schema section, we will add 2 columns by clicking on Add button.

  • Column # : 1
  • Column name: Enter Expense_Type and Column Type: Select string
  • Click on the Save button below.
  • Click on the Add button again.
  • Column #: 2
  • Column name: Enter Expense_Category and Column Type: Select string
  • Click on the Save button below.
  • After adding both the columns, click on the Next button.

Review the configuration of the table and click on the Create button. The table is now created.

Task 4: Query table in Athena

In this task, we are going to use the SQL query capabilities of Amazon Athena to interact with and analyze the data stored in the created table. By executing SQL statements in the Athena Query Editor.

  1. Navigate to Services menu in the top, then click on Athena in the Analytics section.
  2. In the left sidebar, Click on Query Editor
  3. Switch the WhizWorkgroup*(Your created workgroup) at the top right.
  1. In the left sidebar, under data source, select the database as whizgluedatabase*(Your created database)
  2. Then you will see our table, whiz-sample-table*(Your created database)
  3. To preview the data of whiz-sample-table table, click on : and select the Preview Table.
  1. Query editor will automatically generate the SQL statement for querying the first 10 columns.
  2. The result of the query is shown below.
  1. To get the results of all expenses types under expense_category of Food, paste the following SQL statement into the query editor
SELECT * FROM "whizgluedatabase"."whiz-sample-table" WHERE expense_category = 'Food'

Note: To execute the queries through the keyboard directly, use the shortcut Ctrl + Enter (For windows) or Tab + Enter (For Mac)

Fun Facts:

Athena’s serverless nature allows users to analyze data stored in S3 without the need for any infrastructure provisioning or management. This means that users can focus solely on querying and analyzing their data without worrying about setting up and maintaining servers or clusters. The pay-per-query pricing model of Amazon Athena ensures cost efficiency, as users are only charged for the actual queries they run. This flexibility and cost-effectiveness make Amazon Athena a powerful and convenient tool for data analysis tasks in AWS.

Task 6: Delete AWS Resources

Deleting the database and table in Glue

  1. Navigate to Services menu at the top, then click on AWS Glue in the Analytics section.
  2. By default, you will be able to see the tables present. In the left sidebar, Under Data catalog, Click on Databases.
  3. Databases will be present here, our database is whizgluedatabase.
  4. To delete the database, perform the following tasks:
  • Select the database, whizgluedatabase
  • Choose Delete from the menu.

Confirm the deletion on the pop-up by clicking on the Delete button. The database is successfully deleted now.

Deleting workgroup

  1. Navigate to Services menu in the top, then click on Athena in the Analytics section.
  2. Click on the workgroup.
  3. Select WhizWorkgroup and Click on Actions
  4. Click on Delete
  5. Confirm the deletion on the pop-up by entering workgroup name i.e., WhizWorkgroup and click on the Delete button

Completion and Conclusion

  1. We have successfully created and configured a workgroup in Amazon Athena.
  2. We have successfully created a table in Amazon Athena with S3
  3. We have successfully queried the SQL statements in the Query editor of Athena.

--

--