How to get Apache Superset to connect to Athena
This took a couple of rounds for me, so I thought I’d just share my learnings on how to get Apache Superset connecting to Athena. I’m running this on my local Mac. These are NOT “how to set it up in prod” instructions.
I started with the instructions here: https://superset.incubator.apache.org/installation.html#start-with-docker
git clone https://github.com/apache/incubator-superset/
#
# Hidden steps here, to be revealed later in this post
#
cd incubator-superset/contrib/docker
docker-compose run --rm superset ./docker-init.sh
docker-compose up
This stood up the cluster for me. But, without the connectors I need, this is pretty much worthless.
The trick is that you need to customize your docker image to suit your needs. So, after you git clone
you can edit your Dockerfile
to include:
apt install default-jre
pip install "PyAthenaJDBC>1.0.9"
Do these in the place in the Dockerfile
where you see similar looking commands.
e.g. my Dockerfile
has these sections in them
RUN apt-get install -y build-essential libssl-dev \
libffi-dev python3-dev libsasl2-dev libldap2-dev \
libxi-dev default-jre
and
RUN pip install --upgrade setuptools pip \
&& pip install -r requirements.txt -r requirements-dev.txt \
&& pip install "PyAthenaJDBC>1.0.9" \
&& rm -rf /root/.cache/pip
And, when you’re in the UI, this is the JDBC string you want:
awsathena+jdbc://<Your-AWS-key>:<Your-AWS-key-secret>@athena.<AWS-Region>.amazonaws.com/?s3_staging_dir=s3://aws-athena-query-results-XXXX-<AWS-Region>
Good luck, and happy Business Intelligencing!