Amazon Athena

2 min readJan 23, 2023

Amazon Athena is a serverless query service that helps to analyze data stored in Amazon S3 buckets
Uses standard SQL language to query files
Built on the Presto engine
Data is analyzed directly from the S3 bucket without moving it
Supports different formats such as CSV, JSON, ORC, Avro, and Parquet
Pricing is based on a fixed amount per terabyte of data scanned
Athena is commonly used with Amazon Quicksight to create reports and dashboards
Use cases for Athena include ad-hoc queries, business intelligence, analytics, reporting, and analyzing logs from AWS services

Athena — Performance Improvement

To improve Athena performance, use a columnar data type, recommended formats are Apache Parquet and ORC, and use a service such as Glue to convert data to these formats
Compressing data can also help to reduce retrieval size
Partitioning datasets by columns can also improve performance by narrowing down the specific location of data to be retrieved in S3.

Athena — Fedarated Query

Amazon Athena supports Federated Query which allows querying data from multiple sources, not just S3
Data Source Connector is a Lambda function that runs federated queries in other services
Data sources can be on AWS or on-premises, including CloudWatch Logs, DynamoDB, RDS, ElastiCache, DocumentDB, Redshift, Aurora, SQL Server, MySQL, HBase on the EMR service, and on-premises databases
Results of the query can be stored in S3 buckets for later analysis
The Federated Query allows joining and combining data from multiple sources

Written by Praveen Pukkala