Mysql table to aws S3 as parquet file and querying using aws athena

Hariprasad K.B
Geek Culture
Published in
2 min readFeb 18, 2020
Photo by Luke Chesser on Unsplash

For analysing big data sets and making reports mysql is not a natural choice because of its architecture. As data is stored in row-based format in mysql common functions like aggregates and group by’s might not perform very well on big datasets.

Here comes the columnar storage systems like the apache parquet (https://parquet.apache.org/) to the rescue. But then the question would be how can we use a file based system for querying and reporting like sql. Fortunately aws S3 and athena can be of great use for such scenarios.

The architecture

figure 1: Transfer data from mysql to s3 as parquet file and build a querying engine with athena

Here i’m using python, sqlalchemy, pandas and pyarrow to do this task

I already have the query to be run with me because i have build a metabase kind of application already in django rest-framework now after the querying i want to store the data as a parquet file for further processing, I’m not showing the query builder code.

now i have the query to be run in sql_quey in the code below , engine is the sqlalchemy connection.

here organisation_id and report.id are two unique identifiers for the S3 location .

table = pa.Table.from_pandas(df,preserve_index=False)

The line above will create datatypes for creating the athena tables but we still need to do some small regular expression processing on it to fix some issues.

The functions create_athena_database , drop_athena_table , create_athena_table are as follows

so with the datatypes we have derived from the query we will create a table in athena using create_athena_table function so that athena can read the parquet file in location in the variables3_url .

connecting with S3 is done using sqlalchemy. Rather than altering the table if the query changes i’m using drop table and create a new table definition .

--

--

Hariprasad K.B
Geek Culture

Software Engineer with interest in solving problems. Craving to learn more and more!. Currently building www.reviewreels.app