What is AWS Athena

John Thuma
DataSeries
Published in
2 min readApr 7, 2019

AWS Athena

AWS Athena is a query service that enables users to analyze data resident in Amazon’s popular Simple Storage Service (Amazon S3) and other AWS services. At first blush the process of using Athena is very simple. You select the data set in S3 where your data is located. You then create a table. You have a couple of options. You can use the built-in Table Creation Wizard, or you can write your own data definition language (DDL) using the Hive dialect. Finally, you run your ANSI SQL query.

Other nice features with respect to Athena include:

Integration with AWS Glue: Glue is a data catalog which acts as a unified repository across various data sources. Glue will allow you to scan your data sources and build up a library of data available to Athena. Click HERE to learn more about AWS Glue.

Low Data Preparation: Athena has no ETL requirement which means you don’t have to curate data or build a data warehouse.

Works with a variety of data types: Supported data types include: Avro, CSV, ORC, JSON, and Parquet.

Pay by the Query: There is no EC2 instance to setup and you are charged $5 per terabyte of data scanned by your queries. If you compress or partition data you will save up to 90% of the Athena costs.

--

--

John Thuma
DataSeries

Experienced Data and Analytics guru. 30 years of hands-on keyboard experience. Love hiking, writing, reading, and constant learning. All content is my opinion.