Install Apache Superset on Ubuntu Server with PostgreSQL and Apache Airflow
In this story, we will start with a virtual machine on Ubuntu running Apache Airflow and PostgreSQL, and add Apache Superset. In the end, we will have a complete environment for ETL (extract, transform and load data) and BI (data visualisation) tasks.
Before starting this story, follow the previous story which describes how to install PostgreSQL and Apache Airflow on an Ubuntu virtual machine:
{1} Create a specific user for Apache Superset
Open a shell on your computer and log into the virtual machine :
ssh ubuntu@123.123.123.123
Create a user for using Apache Superset :
sudo adduser superset
The current verion of Apache Airflow requires Python 3.9 and the dev module associated, so we need to install them :
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.9
sudo apt install python3.9-venv
sudo apt install python3.9-dev
Install a library required by Superset to communicate with PostgreSQL:
sudo apt-get install libpq-dev
Connect to PostgreSQL :
sudo -u postgres psql
And create a user for Apache Superset (here, we grant a readable access to user superset_user to all tables in the schema smart_schema in the database airflow_db):
\c airflow_db
CREATE USER superset_user WITH PASSWORD 'my_db_password';
GRANT CONNECT ON DATABASE airflow_db TO superset_user;
GRANT USAGE ON SCHEMA smart_schema TO superset_user;
GRANT SELECT ON ALL TABLES IN SCHEMA smart_schema TO superset_user;
{2} Create a virtual host
Add a virtual host as an entry for the front end of Superset, for that, open the file with the current virtual hosts:
sudo vi /etc/apache2/sites-enabled/000-default-le-ssl.conf
And enter a new virtual host (here, Apache Superset will listen on port 8088) and quit vi editor:
<VirtualHost *:443>
ProxyPreserveHost On
ProxyPass "/" "http://127.0.0.1:8088/"
ProxyPassReverse "/" "http://127.0.0.1:8088/"
ServerName your_sub_domain_url
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
Add a certificate to your subdomain:
sudo cerbot -d your_sub_domain_url --expand
Close the ssh connection for ubuntu user, and open a new one with the superset user :
exit
ssh superset@123.123.123.123
{3} Install Apache Superset
Create a virtual environnement in the superset home directory :
cd /home/superset/
python3.9 -m venv superset-env
Activate the virtual environment :
source superset-env/bin/activate
Install Apache Superset (including the image library Pillow and the driver to connect to PostgreSQL) :
pip3.9 install apache-superset
pip3.9 install Pillow
pip3.9 install psycopg2-binary
At the end, be sure to read “Running setup.py install for apache-superset … done” on the screen.
Generate a secret key:
openssl rand -base64 42
Create a configuration file :
cd /home/superset/
vi superset_config.py
And paste inside the following content with your own random generated secret key:
# Superset specific config
ROW_LIMIT = 5000
# Flask App Builder configuration
# Your App secret key will be used for securely signing the session cookie
# and encrypting sensitive information on the database
# Make sure you are changing this key for your deployment with a strong key.
# Alternatively you can set it with `SUPERSET_SECRET_KEY` environment variable.
# You MUST set this for production environments or the server will refuse
# to start and you will see an error in the logs accordingly.
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
Make some export:
export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/home/superset/superset_config.py
Initialise the database for Superset:
superset db upgrade
Create an admin user:
superset fab create-admin
Create default role and permissions:
superset init
{4} Create a service to manage Apache Superset instance
Exit and log as Ubuntu user:
exit
ssh ubuntu@123.123.123.123
sudo vi /etc/systemd/system/superset-instance.service
Paste the following lines:
[Unit]
Description=Apache Superset instance daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/etc/environment
Environment="FLASK_APP=superset"
Environment="SUPERSET_CONFIG_PATH=/home/superset/superset_config.py"
User=superset
Group=superset
Type=simple
ExecStart=/bin/bash -c 'source /home/superset/superset-env/bin/activate; superset run -p 8088 --with-threads --reload'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Reload the daemon :
sudo systemctl daemon-reload
Allow the superset user to run the service without being a sudo user :
sudo vi /etc/sudoers.d/superset_service
And paste the following lines:
Cmnd_Alias SUPERSET_USER_SERVICE = /usr/bin/systemctl start superset-instance, /usr/bin/systemctl stop superset-instance, /usr/bin/systemctl restart superset-instance
superset ALL=(ALL) PASSWD:SUPERSET_USER_SERVICE
{5} Run Apache Superset
Log in as Superset user:
exit
ssh superset@123.123.123.123
Run Apache Superset:
sudo service superset-instance start
Open a web browser and go to “your_sub_domain_url” to check that Apache Superset is well running.
{6} Add a database connection
After logging into the Superset application, go to the “Settings” menu, select “database connections”, then the “+ DATABASE” button, fill in the required information.