Pricing & Cost Optimization Athena

kuldeep singh
2 min readAug 9, 2020

--

Introduction to Athena:

Athena was launched back in 2016 which closely follow the aws pay-as-you-go model.It’s serverlrss interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL with no need to manage an infrastructure . It’s been build on top of presto to provide full standard SQL support and support wide range of different data format like CSV, JSON and Apache Parquet(columnar storage) etc.

Pricing :

For using Athena the users are charged on the amount of data scanned on executing the query.

  • Create Table DDL’s are free
  • You’re charged $5 per TB of data scanned (more than reasonable) with minimum charge of 10MB per query although this doesn’t include the cost of the storage used in S3(Athena query results are also saved in s3)
  • While you’re not charged for the failed queries here but incase you decide you cancelled the query in between you’re still charged for the amount of data scanned till then.

Breaking down those costs, $5 per terabyte equals $0,0048828125 per gigabyte . So probably you would still be paying less at the end of month than for your coffees

Cost Optimization :

There are numerous ways to cut short this cost for data heavy usage by using compression techniques while storing the data so you eventually scan less amount of data.

Using columnar storage options like parquet & ORC to allow athena to only read columns relevant to the query and avoid full data scan.

Synergy of columnar storage in conjunction with compression results on achieving some significant performance boost for querying as well.

As a best practice remember to delete your Athena query result which gets stored in S3 to avoid some unnecessary cost which can be handled using S3 expiration policy on bucket

Resources link:

  • https://aws.amazon.com/athena/pricing/
  • https://aws.amazon.com/athena/features/

--

--