Amazon Athena

Praveen Pukkala
2 min readJan 23, 2023

--

  • Amazon Athena is a serverless query service that helps to analyze data stored in Amazon S3 buckets
  • Uses standard SQL language to query files
  • Built on the Presto engine
  • Data is analyzed directly from the S3 bucket without moving it
  • Supports different formats such as CSV, JSON, ORC, Avro, and Parquet
  • Pricing is based on a fixed amount per terabyte of data scanned
  • Athena is commonly used with Amazon Quicksight to create reports and dashboards
  • Use cases for Athena include ad-hoc queries, business intelligence, analytics, reporting, and analyzing logs from AWS services

Athena — Performance Improvement

  • To improve Athena performance, use a columnar data type, recommended formats are Apache Parquet and ORC, and use a service such as Glue to convert data to these formats
  • Compressing data can also help to reduce retrieval size
  • Partitioning datasets by columns can also improve performance by narrowing down the specific location of data to be retrieved in S3.

Athena — Fedarated Query

  • Amazon Athena supports Federated Query which allows querying data from multiple sources, not just S3
  • Data Source Connector is a Lambda function that runs federated queries in other services
  • Data sources can be on AWS or on-premises, including CloudWatch Logs, DynamoDB, RDS, ElastiCache, DocumentDB, Redshift, Aurora, SQL Server, MySQL, HBase on the EMR service, and on-premises databases
  • Results of the query can be stored in S3 buckets for later analysis
  • The Federated Query allows joining and combining data from multiple sources

--

--