Extract and transform data from AWS Athena’s views and load into AWS S3 as a CSV file using AWS Glue.

Teck Onn
2 min readJun 7, 2020

--

We have AWS Athena reading some data in S3, so that we can perform SQL querying for analytics purposes. I created a view, or so called Presto view in Athena and used that view to transform data, then use Glue job to convert it to CSV format and store it in a targeted location. The reason for all these is to keep only relevant information while housekeeping the raw data.

There is a problem during the implementation whereby the Glue job failed to execute and gave the following error:

“error”:”An error occurred while calling o60.getCatalogSource. No classification or connection in [database_name].view_name

The Presto views (views created in Athena) are currently not accessible outside Athena despite being stored and visible in Glue Data Catalog. In fact, only the definition / metadata of Athena view is stored in the Glue Catalog, while the location and classification information is not stored, and hence the job failed because it could not retrieve the S3 location.

The workaround for accessing Athena view from Glue job, is by reading the View with JDBC options. We can read the view as Apache Spark DataFrame with JDBC option to access the Athena view from a Glue job.

  1. Download the AthenaJDBC42_2.0.7.jar[1] and then upload the file to an Amazon Simple Storage Service (Amazon S3) bucket. For more information, see Using Athena with the JDBC Driver[2]
  2. Create the AWS Glue job.
  3. For the IAM role, choose a role that has permissions for both Athena and AWS Glue.
  4. In the “Security configuration, script libraries, and job parameters (optional) section”, for the Dependent jars path, enter the Amazon S3 location of the .jar(Athena JDBC jar) file.
  5. Choose Save job and edit script and then enter a script similar to the following. This example script accesses the Athena view and loads it into a Spark DataFrame. You don’t have to create an AWS Glue connection for the Athena catalog.

Sample Code:

Save the script and run the Glue job, and it will work.

--

--

Teck Onn

A tech enthusiast. Constantly exploring different ideas and technologies. Here you will find the challenges faced in my every work varied from different area.