Using AWS Athena? Check out these alternatives…

Ali LeClerc
Ahana Cloud
Published in
3 min readJan 24, 2023

If you’re a data engineer using AWS today, then you’ve probably heard of and maybe even using AWS Athena. Quickly, AWS Athena is a serverless Amazon service used for querying data using SQL. We see it most commonly used in a data lake or data lakehouse architecture with AWS S3 as the primary data store. Under the hood, Athena uses open source Presto as its compute engine.

It’s *really* easy to get started with Athena because of its serverless nature — there’s no infrastructure to manage. If you have data in S3 already, then you don’t need to move that data to query it.

However, as your data and compute needs grow, you. may run into issues with Athena.

Below are some of the most common reasons we see users looking for alternatives to AWS Athena:

  1. Performance consistency: Because of its shared multi-tenant service, Athena performance can take a hit if too many people are using it at the same time in a region. That means you’ll see queues and latencies.
  2. Cost: Your cloud bill can start to explode with Athena (it costs $5/TB scanned to run Athena). If you’re only running a few queries on small amounts of data, then Athena is a good choice. But if your datasets are large (100s of queries), scanning all of that data will be really expensive.
  3. Visibility & control: You don’t have any management console with Athena, so if you have failures or poor performance you don’t get any visibility into why it’s happening and you can’t get into your deployment to fix it.

Taking these issues into account and many people look for alternatives that give them better price performance and more control over their deployment. Here are some of the common alternatives we see people moving to:

  1. Open Source Presto (do it yourself): You can deploy Presto in your own environment and get total control of your deployment. Your team most likely will need expertise & resources to manage a Presto cluster on your own.
  2. Managed Hadoop and Presto: Cloud providers offer their own managed Hadoop, and in the case of AWS that’s EMR (Elastic Map Reduce). You can use EMR to deploy Presto on Hadoop. This option gives you more management and operational support as opposed to DIY Presto because EMR will take care of cluster management, node recovery, monitoring, and scaling. You also have more control over cost optimization. Still, your team needs to be very hands-on and understand how to manage big data infrastructure.
  3. Managed Presto Service: A managed service for Presto will give you visibility into query performance, instances, security, and query plans, plus you can manage your infrastructure by just clicking a button in a UI. You can also get a pre-configured cluster to start with, and tweak what’s necessary for your workload. Ahana Cloud is a managed service for Presto, and you can just pay as you go through your AWS bill (similar to Athena and EMR).

Here’s a quick comparison guide looking at EMR, Athena, and a Managed Service.

Comparing Presto services: EMR, Athena, Ahana

If you want to dig a little deeper, check out some of our customer use cases on why they moved from Athena to a managed service for Presto: https://ahana.io/amazon-athena/.

--

--

Ali LeClerc
Ahana Cloud

Presto Community Chair, Product Manager at IBM. Chair of the #Presto Foundation Community team. Topics on #bigdata, #dataanalytics, #lakehouse, #opensource