Amazon Athena— The New Serverless Data Analytics Tool

Vishal Padghan
Edureka
Published in
8 min readJan 11, 2019
AWS Athena — Edureka

Data analysis is a very complex process and there has always been attempts to ease it. There are many tools for analytics, and even the popular tech giant Amazon provides an AWS service named Amazon Athena. This Amazon Athena tutorial will guide you through the basics and advance usage of Amazon Athena.

Amazon Athena is an interactive data analysis tool used to process complex queries in relatively less time. It is server-less hence, there is no hassle of setting up and doesn’t require managing the infrastructure. It is not a Database service hence, you just pay for the queries you run. You just point your data in S3, define the schema required and with a standard SQL you are good to go.

The topics covered in this article are as follows:

  • Introduction to Amazon Athena
  • Difference between Microsoft SQL Server and Amazon Athena
  • Use of Amazon Athena
  • Access Amazon Athena
  • Features of Athena
  • Demo — I (Creating Table In Athena)
  • Demo — II (Comparison Between MySQL And Athena)

Introduction To Amazon Athena

On November 20, 2016, Amazon launched Athena as one of its services. As described earlier, Amazon Athena is a serverless query service that makes analysis of data, using standard SQL, stored in Amazon S3 simpler. With few clicks in the AWS Management Console, customers can point Amazon Athena at their data stored in Amazon S3 and run queries using standard SQL to get results in seconds.

With Amazon Athena, there is no infrastructure to set up or manage, and the customer pays only for the queries they run. Amazon Athena scales automatically, executing queries in parallel, which gives fast results, even with a large dataset and complex queries. Now, that you what is Amazon Athena let me take you through the difference it has compared to SQL Server.

Difference Between Microsoft SQL Server And Amazon Athena

Use Of Amazon Athena

If you are a Data Analyst and have an experience of analyzing data stored on S3, you will relate to this,

Data Analysts/Developers: Do you offer Storage?

AWS: Yes. Data Analysts/Developers: Do you have tools for Analytics?

AWS: Not sure.”

Amazon worked on this and came up with Amazon Athena. Now, you have a tool to play with your data. Athena helps you analyze unstructured, semi-structured and structured data that is stored in Amazon S3. Using Athena you can create dynamic queries for your dataset. Athena also works with AWS Glue to give you a better way to store the metadata in S3.

Using AWS CloudFormation and Athena, you can use named queries. Named query allows you to name your query and then call it using the name.

This interactive service from AWS can be used by Data Scientists, developers to take a sneak peak into the table instead of running the complete query. It is also used to fetch data from S3, load it to different data stores using Athena JDBC driver, for log storage/analysis and Data Warehousing events.

Now that you know Athena is an interesting tool, let’s find out in this Amazon Athena tutorial how to get your hands on this amazing service from Amazon.

Accessing Amazon Athena

Accessing Athena is very easy and it can be done by either:

  • AWS Console
  • AWS CLI
  • Athena with your JDBC

These are few of the ways to access Amazon Athena. By now, you pretty much know everything important about Amazon Athena. Let’s me walk you through the different features of Athena.

Features Of Athena

Out of the many services provided by Amazon, Athena is one of the services. It has many features that makes it suitable for Data Analysis. Let’s take a look at the different features one by one.

  1. Easy Implementation: Athena doesn’t require installation. It can be accessed directly from the AWS Console also directly by AWS CLI.
  2. Serverless: It is serverless, so the end-user doesn’t need to worry about infrastructure, configuration, scaling or failure. Athena takes care of everything on its own.
  3. Pay per query: Athena charges you only for the query you run, i.e. the amount of data that is managed per query. You can save a lot if you can compress them and format your dataset accordingly.
  4. Fast: Athena is a very fast analytics tool. It can perform complex queries in less time by breaking the complex queries into simpler ones and run them parallelly, then combine the results to give the desired output.
  5. Secure: With the help of IAM policies and AWS Identity, Athena gives you complete control over the data set. As the data is stored in S3 buckets, IAM policies can help you manage control to users.
  6. Highly available: With the assurance of AWS, Athena is highly available and the user can execute queries round the clock. As AWS is 99.999% available, so is Athena.
  7. Integration: The best feature of Athena is that it can be integrated with AWS Glue. AWS Glue will help the user to create a better-unified data repository. This helps you create better versioning of data, better tables, views, etc.

Great isn’t it? Athena provides many features at the same time, it is cost-efficient.

By now you must be impressed by AWS Athena. Now that you know quite a lot about Athena. Let’s roll our sleeves and understand the working of Athena by performing a small demo. In this Amazon Athena tutorial, we will work on two Demos, let’s find out what are they.

Demo — I (Creating Tables In Athena)

As you know all about Amazon Athena, let’s take a dive on how to query your data stored as .json file in Amazon S3 using Athena.

  1. Create multiple JSON files containing entries
  2. Store the files to S3 bucket
  3. Create an external table for the files stored in S3
  4. Write a Query for accessing the data

Let’s understand how to do the above-said tasks one by one.

  1. Create JSON Files. (Create the data without using newline character)

2. We will access S3 bucket using AWS CLI

a. Configure IAM User

b. Create S3 Bucket

c. Copy files to S3 Bucket

3. Create External Table in Athena. There are two ways of doing this:

a. Using AWS Glue Crawler

b. Manually

4. We will create it manually:

a. Create table.

b. Create a new database if you don’t have one. Give a table name. Give the location of your file.

c. Select the type of file you will be working with. Select the architecture of the data in your file.

d. As the entered data is not that complex, we don’t need a partition. Click on “Create Table”.

e. Athena will auto-generate the Query for creating External Table and run it.

You have your external table ready.

5. We write a query to select all data from the table.

a. select * from testdb;

b. Click on Run Query and you have all the information in your table.

Demo — II (Comparison Between Amazon Athena And MySQL)

In this Amazon Athena tutorial, now we will compare MySQL and Athena and understand as to how even simple queries take less time to execute in Athena.

  1. Loading CSV file to MySQL took around 1 hour but in Athena, it took just 3 mins to upload the CSV file to S3 and 0.42 seconds to create a table for the same.

2. Select query. select * from table.

Select query in Athena.

Select query in MySQL.

3. Selecting a specific column from the table.

Select a specific column in Athena

Select a specific column in MySQL.

4. Getting the count of a specific column.

Count of a specific column in Athena.

Count of a specific column in MySQL.

5. Counting number of records in the table.

Count all records in Athena.

Count all records in MySQL.

6. Select query with a specified range.

Select query within the said range in Athena.

Select query within the said range in MySQL.

That was a brief comparison on basic SQL commands between MySQL and Amazon Athena.

If you wish to check out more articles on the market’s most trending technologies like Artificial Intelligence, DevOps, Ethical Hacking, then you can refer to Edureka’s official site.

Do look out for other articles in this series which will explain the various other aspects of AWS.

1. AWS Tutorial

2. AWS EC2

3. AWS Lambda

4. AWS Elastic Beanstalk

5. AWS S3

6. AWS Console

7. AWS RDS

8. AWS Migration

9. AWS Fargate

10. Amazon Lex

11. Amazon Lightsail

12. AWS Pricing

13. AWS Resume

14. AWS CLI

15. Amazon VPC Tutorial

15. AWS vs Azure

17. On-premise vs Cloud computing

18. Amazon Dynamo DB Tutorial

19. How To Restore EC2 From Snapshot?

20. AWS CodeCommit

21.Top AWS Architect Interview Questions

22. How To Restore EC2 From Snapshot?

23.Create Websites using AWS

24.Amazon Route 53

25. Securing Web Applications With AWS WAF

Originally published at www.edureka.co on January 8, 2019.

--

--