CodeX
Published in

CodeX

Powerful BI Tool: Apache Superset

Clear Explanation of How to Build & Run Superset. How We Can Add New Datasources like MsSQL. How We Can Store Our Metadata On Custom Database. How to Apply Local Changes on Superset.

I recall that I was super happy when I saw Apache Superset for the first time. I was enormously excited to use it. However, it did not take long before I started suffering while trying to run it on my local. Then I created some scripts that I will share with you in the coming sections of this paper to put an end to my suffering. Before diving into the story; you can access the repo that holds my scripts below.

Photo by Hans-Peter Gauster on Unsplash

In this story, I will tell the experience that I got while trying to run Superset.

Running Superset From Source

Actually, there are several ways to run and install Superset that are mentioned in its official document. We can look at them by using the link below.

I tried 3 ways of them:

I couldn’t get the result I wanted with any of them.

  • I succeed with the PyPi way, but I couldn’t set the configurations like Metadata database changing.
  • I couldn’t run from the source code. I got errors whenever I tried.
  • The result I got was really closed to the result I wanted with Docker Compose. I needed to change the metadata database. However, I was able to run just non-dev mode. To do that I need to run Superset on dev mode. When I try to run dev mode, I got the error npm WARN EBADENGINE Unsupported engine . I solved that issue by giving the sudo privileges to the docker-compose.yml file. Then another problem came up: MSDialect_adodbapi. I was too near to freaking out.

Then, I read some papers on the web about how to customize Superset and how to run it as just a Docker container. I started to create my own scripts to make it production ready.

Preparing A Dockerfile To Make Superset As Production Ready

There is a Docker image from Apache for Superset on Docker Hub.

I succeed to run Superset by using the Docker image. But it was not important for me to run Superset. I was related to how to apply my local changes (for instance, to set metadata database) to Superset. Because I had already succeeded to run Superset on my local in several ways.

At this point, I was not able to store my metadata on the database that I wanted. Default, Superset store our metadata in a volume by using SQLite. I was nervous about that what happens if the volume will be deleted. Also, I was not able to install the database drivers that I wanted to use.

I read the source code to solve my particular problems like how to change my metadata database. I learned that, If we set SUPERSET_CONFIG_PATH variable, CONFIG_PATH_ENV_VAR will be created while runtime and we will be able to override the configure variables. We can see that by visiting the link.

Obviously, I needed to make changes in superset_config.py file. I also learned that from the link below.

After learning efforts, I created the Dockerfile below.

FROM apache/superset# Switching to root to install the required packages
USER root
RUN pip install psycopg2RUN pip install psycopg2-binary# Switching back to using the `superset` user
USER superset
# If we set SUPERSET_CONFIG_PATH variable, CONFIG_PATH_ENV_VAR will be created while runtime and we will be able to override the configure variables [https://github.com/apache/superset/blob/master/superset/config.py#L1351]
COPY superset_config.py /app/
ENV SUPERSET_CONFIG_PATH /app/superset_config.py

I wanted to use PostgreSQL to store my metadata. Therefore, I added the 4th and 5th lines to my Dockerfile.

Now we should have a look at superset_config.py file. We can override Superset constraint variables by this file.

SQLALCHEMY_DATABASE_URI = 'postgresql://USERNAME:PASSWORD@HOST:PORT/DB_NAME'

And we told to Superset that it should look at that file to get configurations: /app/superset_config.py.

Also, there is another very important point: if we want to use different database types, we should install their drivers in our container. To do that, we can add more RUN pip install <package_name> . Superset uses SQLAlchemy to connect databases.

I created build-image.sh script to build a Docker image from the Dockerfile we created above. I am able to easily apply my local changes to Superset by this script.

#! /bin/bash # You should execute this script everytime when you changed something in `superset_config.py` or in `Dockerfile` IMAGE_TAG='local/superset' sudo docker image rm -f $IMAGE_TAG sudo docker image build -t $IMAGE_TAG . echo -e "\e[1;32m IMAGE BUILDED: $IMAGE_TAG \e[0m"

And there is another script which is first-build-superset.sh file. I use this file to create and initialize Superset’s database on my metadata database. Also, I can have a default user access the Superset dashboard.

#! /bin/bash

# You should execute this script only if it's the first time you are setting up the Superset on your env

CONTAINER_NAME='superset-live'

APP_URL='http://localhost:8088/'

sudo docker container rm -f $CONTAINER_NAME

echo -e "\e[1;31m CONTAINER REMOVED: $CONTAINER_NAME \e[0m"

sudo docker run --rm -d -p 8080:8088 --net=host --name $CONTAINER_NAME local/superset

echo -e "\e[1;32m CONTAINER CREATED: $CONTAINER_NAME \e[0m"

sudo docker exec -it $CONTAINER_NAME superset fab create-admin \
--username admin \
--firstname Superset \
--lastname Admin \
--email admin@superset.com \
--password admin

echo -e "\e[1;32m ADMIN USER CREATED! \e[0m"

sudo docker exec -it $CONTAINER_NAME superset db upgrade

echo -e "\e[1;32m DATABASE UPGRADED! \e[0m"

sudo docker exec -it $CONTAINER_NAME superset load_examples

echo -e "\e[1;32m EXAMPLES LOADED! \e[0m"

sudo docker exec -it $CONTAINER_NAME superset init

echo -e "\e[1;32m SUPERSET INIT COMPLETED! \e[0m"

echo -e "\e[1;32m YOU CAN VISIT: $APP_URL \e[0m"

Lastly, I have build-superset.sh file. I use this to re-run Superset on my host machine. I don’t use the previous one because It overrides the metadata on the metadata database, now, I need to just run Superset without executing any database operation. Because I had already created my metadata database by executing the previous one. I just run Superset by using the script below, then Superset will be automatically connected to the metadata database. And I can safely delete my Docker containers because I have already stored metadata on the database I wanted. There is no risk to delete the containers.

#! /bin/bash

# You should execute this script everytime when you changed something in `superset_config.py` or in `Dockerfile`

IMAGE_TAG='local/superset'

sudo docker image rm -f $IMAGE_TAG

sudo docker image build -t $IMAGE_TAG .

echo -e "\e[1;32m IMAGE BUILDED: $IMAGE_TAG \e[0m"

And we have 3 scripts to easily run Superset on local or even in a server. If any update is needed, we can just adjust the superset_config.py file and then re-build the image and re-create a container.

Image by Author

If needed to re-share the repo that I created for these scripts, you can access it by using the link below.

Finally

Hopefully, you enjoyed it. I think applying local changes is a crucial part of using Superset. We would do a good job by using different ways to run Superset. But If we couldn’t store our metadata where we want, we had to do the same things we did before on every restart. Also, I will publish a video on my Youtube channel about this topic.

https://www.youtube.com/c/BaysanSoft

Kind regards.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Baysan

Baysan

321 Followers

Lifelong learner & Freelancer. I use technology that helps me. I’m currently working as a Business Intelligence & Backend Developer. mebaysan.com