Install Apache Superset on Ubuntu Server with PostgreSQL and Apache Airflow

Jean-Michel ARNAUD
4 min readJul 25, 2024

--

In this story, we will start with a virtual machine on Ubuntu running Apache Airflow and PostgreSQL, and add Apache Superset. In the end, we will have a complete environment for ETL (extract, transform and load data) and BI (data visualisation) tasks.

Ubuntu, the operating system, PostgreSQL, the database, Apache Airflow, the extract transform and load tool, Apache Superset, the data visualization tool

Before starting this story, follow the previous story which describes how to install PostgreSQL and Apache Airflow on an Ubuntu virtual machine:

{1} Create a specific user for Apache Superset

Open a shell on your computer and log into the virtual machine :

ssh ubuntu@123.123.123.123

Create a user for using Apache Superset :

sudo adduser superset

The current verion of Apache Airflow requires Python 3.9 and the dev module associated, so we need to install them :

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.9
sudo apt install python3.9-venv
sudo apt install python3.9-dev

Install a library required by Superset to communicate with PostgreSQL:

sudo apt-get install libpq-dev

Connect to PostgreSQL :

sudo -u postgres psql

And create a user for Apache Superset (here, we grant a readable access to user superset_user to all tables in the schema smart_schema in the database airflow_db):

\c airflow_db
CREATE USER superset_user WITH PASSWORD 'my_db_password';
GRANT CONNECT ON DATABASE airflow_db TO superset_user;
GRANT USAGE ON SCHEMA smart_schema TO superset_user;
GRANT SELECT ON ALL TABLES IN SCHEMA smart_schema TO superset_user;

{2} Create a virtual host

Add a virtual host as an entry for the front end of Superset, for that, open the file with the current virtual hosts:

sudo vi /etc/apache2/sites-enabled/000-default-le-ssl.conf

And enter a new virtual host (here, Apache Superset will listen on port 8088) and quit vi editor:

<VirtualHost *:443>
ProxyPreserveHost On
ProxyPass "/" "http://127.0.0.1:8088/"
ProxyPassReverse "/" "http://127.0.0.1:8088/"

ServerName your_sub_domain_url

ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Add a certificate to your subdomain:

sudo cerbot -d your_sub_domain_url --expand

Close the ssh connection for ubuntu user, and open a new one with the superset user :

exit
ssh superset@123.123.123.123

{3} Install Apache Superset

Create a virtual environnement in the superset home directory :

cd /home/superset/
python3.9 -m venv superset-env

Activate the virtual environment :

source superset-env/bin/activate

Install Apache Superset (including the image library Pillow and the driver to connect to PostgreSQL) :

pip3.9 install apache-superset
pip3.9 install Pillow
pip3.9 install psycopg2-binary

At the end, be sure to read “Running setup.py install for apache-superset … done” on the screen.

screenshot at the end of the installation

Generate a secret key:

openssl rand -base64 42

Create a configuration file :

cd /home/superset/
vi superset_config.py

And paste inside the following content with your own random generated secret key:

# Superset specific config
ROW_LIMIT = 5000

# Flask App Builder configuration
# Your App secret key will be used for securely signing the session cookie
# and encrypting sensitive information on the database
# Make sure you are changing this key for your deployment with a strong key.
# Alternatively you can set it with `SUPERSET_SECRET_KEY` environment variable.
# You MUST set this for production environments or the server will refuse
# to start and you will see an error in the logs accordingly.
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'

Make some export:

export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/home/superset/superset_config.py

Initialise the database for Superset:

superset db upgrade

Create an admin user:

superset fab create-admin

Create default role and permissions:

superset init

{4} Create a service to manage Apache Superset instance

Exit and log as Ubuntu user:

exit
ssh ubuntu@123.123.123.123
sudo vi /etc/systemd/system/superset-instance.service

Paste the following lines:

[Unit]
Description=Apache Superset instance daemon
After=network.target postgresql.service
Wants=postgresql.service

[Service]
EnvironmentFile=/etc/environment
Environment="FLASK_APP=superset"
Environment="SUPERSET_CONFIG_PATH=/home/superset/superset_config.py"
User=superset
Group=superset
Type=simple
ExecStart=/bin/bash -c 'source /home/superset/superset-env/bin/activate; superset run -p 8088 --with-threads --reload'
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Reload the daemon :

sudo systemctl daemon-reload

Allow the superset user to run the service without being a sudo user :

sudo vi /etc/sudoers.d/superset_service

And paste the following lines:

Cmnd_Alias SUPERSET_USER_SERVICE = /usr/bin/systemctl start superset-instance, /usr/bin/systemctl stop superset-instance, /usr/bin/systemctl restart superset-instance

superset ALL=(ALL) PASSWD:SUPERSET_USER_SERVICE

{5} Run Apache Superset

Log in as Superset user:

exit
ssh superset@123.123.123.123

Run Apache Superset:

sudo service superset-instance start

Open a web browser and go to “your_sub_domain_url” to check that Apache Superset is well running.

{6} Add a database connection

After logging into the Superset application, go to the “Settings” menu, select “database connections”, then the “+ DATABASE” button, fill in the required information.

--

--

Jean-Michel ARNAUD

Enthusiastic about digital transformation management, with a strong focus on customer satisfaction and technical excellence.