Amazon Athena
2 min readJan 23, 2023
- Amazon Athena is a serverless query service that helps to analyze data stored in Amazon S3 buckets
- Uses standard SQL language to query files
- Built on the Presto engine
- Data is analyzed directly from the S3 bucket without moving it
- Supports different formats such as CSV, JSON, ORC, Avro, and Parquet
- Pricing is based on a fixed amount per terabyte of data scanned
- Athena is commonly used with Amazon Quicksight to create reports and dashboards
- Use cases for Athena include ad-hoc queries, business intelligence, analytics, reporting, and analyzing logs from AWS services
Athena — Performance Improvement
- To improve Athena performance, use a columnar data type, recommended formats are Apache Parquet and ORC, and use a service such as Glue to convert data to these formats
- Compressing data can also help to reduce retrieval size
- Partitioning datasets by columns can also improve performance by narrowing down the specific location of data to be retrieved in S3.
Athena — Fedarated Query
- Amazon Athena supports Federated Query which allows querying data from multiple sources, not just S3
- Data Source Connector is a Lambda function that runs federated queries in other services
- Data sources can be on AWS or on-premises, including CloudWatch Logs, DynamoDB, RDS, ElastiCache, DocumentDB, Redshift, Aurora, SQL Server, MySQL, HBase on the EMR service, and on-premises databases
- Results of the query can be stored in S3 buckets for later analysis
- The Federated Query allows joining and combining data from multiple sources