Apache Superset in detail

Hossein Torabi
5 min readOct 28, 2019
Apache Superset logo

Introduction

Superset was started as a project at Airbnb to serve as an open & fully customizable application to visualize and explore massive amounts of data in a fast & intuitive way.

Since its release as an open-source project, hundreds of companies have started to use it, more than 418 developers contribute to its source code and it has been battle-tested in environments with hundreds of daily active users.

The superset is now being incubated as a project of the Apache Software Foundation and is often mentioned as an alternative to Tableau, Looker, Power BI, and other business intelligence solutions.

Advantages

  1. create beautiful data visualizations on the fly:
    Superset comes with tons of different, rich visualization types included. Explore, slice & dice your data in a simple, visual way.
Apache Superset visualization explorer

2. Share interactive dashboards:
Easily filter the data on the dashboard and drill deeper into the details of any chart. Organize contents in a grid layout and even on different tabs.
Share any chart or dashboard by simply copy & paste the URL. You can even make dashboards public, use them on a TV screen with auto-refresh or embed them into your own websites or applications.

Apache Superset dashboard

3. Countless Visualization Types:
Also, please visit the superset visualization gallery.

apache superset visualization gallery

3. SQL editor directly in the browser:
Superset comes with SQL Lab — a modern, feature-rich SQL IDE for deeper analysis of your data. Write queries against your database, view table metadata and visualize the results as charts.

Apache Superset sql_lab

4.Connect to any database:
Superset connects to all common SQL-speaking databases directly out of the box.
For more detail, please check this link.

Apache Superset databases support

Disadvantages

Maybe this the best part of this article:D because of everywhere they only speak about superset features but they don’t about the weakness or disadvantages of it also, it’s not only about superset! it’s about almost all the products that I know and it’s a good idea to read this section before deciding to install superset!

  1. Upgrading is horrible:
    Because of the lack of true software architect in superset in some cases, upgrading is horrible Almos it happens when the architect of the superset database needs to migrate.
  2. sql_lab is not perfect:
    sql_lab user interface is not really attractive and good enough for running ad-hoc queries and personally I prefer to use other tools like DataGrip or DBeaver use sql_lab.
    Especially the importance of this issue is that almost non-developers, I mean other people of the company such as BI unit, Marketing unit that uses the superset are not interested to use sql_lab and they only interested to use dashboards and charts.
  3. User Interface Desing is no perfect:
    User interface design is not really attractive and it’s not perfect enough. Other alternatives software like **Metabase** or **Tableau** have a better design.
  4. Lack of true permissions:
    permissions in the superset are not handled enough and it’s really suffering for example you can grant a permission to a user to access the charts to see it and modify it but you can’t do it with the only read access.

There are more disadvantages to it but as a superset user and contributor, I found it as a good solution than alternative software. In addition, the superset is one of the Apache incubating projects and it’s totally understandable.

Superset Architecture

Apache Superset can run on the sequential and distributed mode. In sequential mode, it can only run queries that run below 60 seconds.

In distributed mode superset spread the queries(only it do it for sql_lab, not for dashboards and explore_json) between its celery workers. Superset can use RabbitMQ or Redis for distributing the tasks and it uses Redis for caching queries.

Apache superset architecture in distributed mode

Installation

Superset installation is super easy just follow these steps:

pip install superset

# Initialize the database
superset db upgrade

# Create an admin user (you will be prompted to set a username, first and last name before setting a password)
$ export FLASK_APP=superset
flask fab create-admin

# Load some data to play with
superset load_examples

# Create default roles and permissions
superset init

# To start a development web server on port 8088, use -p to bind to another port
superset run -p 8080 --with-threads --reload --debugger

Also, it can configure to work with Gunicorn and Gevent:

/usr/local/bin/gunicorn --workers 4 --worker-class gevent --timeout 60   --bind 0.0.0.0:8080 --pid /home/superset/superset.PIDFile superset:app

For running Celery worker with Gevent just run this:

/usr/local/bin/celery worker --app=superset.tasks.celery_app:app --pool=gevent -Ofair -c 4

In the distributed mode, It should create superset_config.py in the SUPERSET_HOME path and run the celery workers and superset webservers.
The superset config file should be like the one that exists on the superset repository, In addition, it’s not necessary to overwrite all of it.
In general, this config will be enough:

import os
from werkzeug.contrib.cache import RedisCache
MAPBOX_API_KEY = os.getenv('MAPBOX_API_KEY', '')
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 60,
'CACHE_KEY_PREFIX': 'superset_',
'CACHE_REDIS_HOST': 'localhost',
'CACHE_REDIS_PORT': 6379,
'CACHE_REDIS_DB': superset,
'CACHE_REDIS_URL': 'redis://localhost:6379/2'}
SQLALCHEMY_DATABASE_URI = 'mysql://superset:superset@1ocalhost:3306/superset'
SQLALCHEMY_TRACK_MODIFICATIONS = True
SECRET_KEY = 'superset'SQLLAB_TIMEOUT = 60SUPERSET_WEBSERVER_TIMEOUT = 60class CeleryConfig(object):
BROKER_URL = 'pyamqp://superset:superset@localhost:5672'
CELERY_IMPORTS = ('superset.sql_lab', 'superset.tasks',)
CELERY_RESULT_BACKEND = 'redis://localhost:6379/superset'
CELERY_ANNOTATIONS = {'tasks.add': {'rate_limit': '10/s'}}
CELERYD_LOG_LEVEL = 'DEBUG'
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERY_CONFIG = CeleryConfig
RESULTS_BACKEND = RedisCache(
host='localhost',
port=6379,
key_prefix='superset_results',
db=superset
)

In addition, I create the GitHub repository to install superset with Ansible playbooks. It’s not totally completed but it’s easy and useful.

--

--