PyCharm & AWS Athena

Published in

Slido developers blog

8 min readFeb 25, 2020

Have you ever tried debugging a really long query with a lot of SUBQUERIES and WITH statements in AWS Athena console? If so, you know how painful it can be. I guess you’ve tried to look around for an alternative. If you did so recently, there is a good chance you found this article (by Gary A. Stafford, an AWS Emerging Partner Solutions Architect). If you tried to follow it, you probably managed to get PyCharm and Athena to work. But you may have run into some hiccups this article will try to resolve.

Once you complete this tutorial you should be able to connect to AWS Athena from PyCharm. Not only that, running queries, sub-queries, and full suggestions support should all work as you would expect them to.

Don’t get me wrong, AWS Athena editor can do the job, run queries, even suggest something, sometimes. But I wouldn’t be able to work in that simple and quite limited console for a longer time. I am always looking for a better experience. We are editing text after all, so using a proper text editor feels quite appropriate.

There is one known issue of this tutorial: the native support of the Presto SQL dialect. Instead, we will use Amazon Redshift, since it is the closest supported one. Its only caveat is that a few functions will be marked as missing or unknown. This can certainly cause some confusion, but it’s a small price to pay for all the other features this combo will bring you.

PyCharm Community vs. PyCharm Professional

First of all, you will need PyCharm Professional. It seems like these DB additions are a way for JetBrains to upsell existing clients from PyCharm Community. If you’re expecting AWS Athena to be one of the supported drivers, then let me lead you out of the truth. AWS Athena driver is not supported. Even though AWS Athena is one of the core data stores of AWS, the biggest cloud service at current times.

To make it works we will have to download Athena JDBC Drivers and import it to PyCharm manually as you’ll see in the next section.

The driver was created by Simba and you can find more about it on their webpage. It will work with Java 8 and above. If you for any reason need to use an earlier version of Java, previous versions of Athena JDBC Drivers do seem to support them, but we only tested this tutorial with the Java 8+ version mentioned above.

Adding AWS Athena Driver

First things first, if you haven’t installed PyCharm Professional do so now. Open View > Tool Windows > Database.

PyCharm > View > Database — *PyCharm > View >* Tool Windows > *Database*

The database sidebar should be visible now. Click on the+ icon and choose Driver.

Feel free to fill Name with the text of your choice. In Driver File section click on the + icon and select Custom JARs. Add Athena JDBC Driver that you downloaded to Driver Files. Then change Classto com.simba.athena.jdbc.Driver.

Switch to Options tab, change Dialect to Amazon Redshift as this dialect is closest to Athena dialect. Otherwise, you will have a problem with syntax highlighting and running queries.

Now we are ready to connect to AWS Athena. Click on the + icon again and hover over Data Source. There are a lot of different drivers, but we are looking for the one you just created in the step before. I named it AWS Athena.

Database > Plus icon > Data Source > AWS Athena

Connection to AWS Athena

You will have a few options on how to access AWS Athena. You can read more about it in the documentation on page 23. You will most likely have AccessKey and SecretAccessKey as the most common way of auth. Later on, we will also see how to connect with auto-generated temporary credentials. Let’s fill in the necessary info and test the connection.

Creating a new Data Source connection with AWS Athena Driver

Connecting via AccessKey and SecretAccessKey
Name your connection, fill User as AccessKey and Password as SecretAccessKey. Change region and s3-output-location in the following line and use it as URL.

jdbc:awsathena://AwsRegion=eu-west-1;S3OutputLocation=s3://some-s3-temp-folder/bucket/

Connecting via temporary credentials file

Instead of access keys, you can have a temporary credentials file. This is usually the case when your access to AWS is managed by an identity provider such as Okta or others. The file contains your temporary access keys for a profile.

[your-profile]
aws_access_key_id = ...
aws_secret_access_key = ...
region = ...
aws_session_token = ...

The credentials file can be found at $HOME/.aws/credentials. Make sure you regenerating the file by identity provider before testing. There are more options that you can read about here.

To set up access with temporary credentials file leave User and Password empty and use the string below as URL. As mentioned before don’t forget to change your region, s3-output-location, profile.

jdbc:awsathena://AwsRegion=eu-west-1;S3OutputLocation=s3://some-s3-temp-folder/bucket/;AwsCredentialsProviderClass=com.simba.athena.amazonaws.auth.profile.ProfileCredentialsProvider;AwsCredentialsProviderArguments=your-profile;

Creating AWS Athena connection by a temporary credentials file

If your temporary access info is valid you should be able to pass the connection test.

After PyCharm says “connection is valid”, go to the Schemas tab from the menu. It can take a while to download the schema from the server. Once it is done, choose the schemas you would like to work with. Otherwise, you will see an active connection with 0 active schemas.

Trust, but verify

Are you ready for the real test? Open PyCharm > View > Database.
Right-click on your connection and choose Jump to Console. In the next window, you will be asked to open New Console or continue with the previous one. This is up to you. You can imagine it as another tab in AWS Athena. Each console can run one query at a time.

Open Database and connection via Console

On the image, there is a simple query to demonstrate the connection. But in the real world, you don’t get paid to run a simple query. Usually, a query is long, a few hundred lines sometimes even more. To debug such a query in AWS Athena is a pain in the ass.

But here it is a brand new level. You can select a part of the SQL and run just that bit. You can easily jump back and forth in history. You can create your params that can be repeatedly used and, of course, full suggestion and syntax highlighting as you would expect to have in the 21st century is also available.

Running with .sql file
First, go to Preferences > Tools > Database and make sure you have ticked Always show console toolbar in editor. Otherwise, you won’t see the bar below and you will have always do right-click and choose Attach Session.

Tick *Always show console toolbar in editor*

Now in the right corner of you should see a dropdown with sessions. It is almost the same as with the console. Each session is like a tab in AWS Athena. In one session, you can run one query. Make sure your dialect is set to Amazon Redshift, otherwise suggestions wouldn’t work well.

You can run a whole .sql file or just a part of it by selecting a particular section. This can vastly improve the development of any team that works with data on a daily basis. A few minutes of implementation will pay off soon.

Tips & tricks

This section is optional and will contain some tips and tricks we encountered while trying this out and can be handy in various cases.

Make connection global
The connection works, we are happy, and this happiness lasts until we open another project also connected to the same AWS Athena and we are missing our connection. Don’t worry, with a few clicks you can set up the connection as global and it will be available in each project.

Go to Properties of the connections. You should see the pop-up below. Right-click on the connection name in the left menu and then click on Make Global.

Dump data to CSV
By default, PyCharm expects the data from the DB to come in TSV format. A more common format is CSV, at least in our circles. This default can be changed by clicking on the settings icon next to the export button.

Export > Paste Format > Comma-separated (CSV)

Add the header to CSV
By default, a CSV dump will miss the header. But you can change this in CSV Format settings. There are a few more interesting options such as “Trim whitespaces” or separator settings. But this is more or less about your preferences.

Export > Paste Format > Configure CSV Format

Conclusion

The world of SQL, AWS, and Athena is often cold and sad. But this small hack of IDE made a big improvement in my relationship with AWS Athena. I hope it will help you too and will make your world a better place.

If you have any tips on how to make this tutorial more useful, leave a comment or just share it with your colleagues.

Btw, here in slido.com we often need help. People who care about their working environment are always a good fit. Check out the career page and who knows, you may end up being a colleague.

Happy querying!