Superset as a bridge between devs and domain experts

Juanlu
urbanData Analytics
6 min readApr 25, 2019

Sometimes it is difficult to create a product that is both built by developers and non-devs or non-technical people. Why would you want something like that? I’m going to explain a simple problem that happened to us here at urbanData Analytics.

Sometimes, to complete several processes, we needed to build and interpret different visualizations. Usually, two particular profiles come into play:

  • A developer: knows how to build a cool Kibana, Prometheus or hand-crafted dashboard (using Python’s ggplot, for example). We’ll refer to them from now on as Dev.
  • A business expert: knows how to interpret and act based on the dashboard’s visualizations, they are domain experts. We’ll refer to them from now on as Biz.

Now, let’s say Biz has realized a plot is incorrect because the data is not using the correct columns and the color combination is awful. What would Dev’s work cycle?

  1. Discuss the issue with Biz (hopefully a brief conversation in Slack is enough)
  2. Add the task to Dev’s queue and prioritize it somehow
  3. Find the place in which the change must happen
  4. Modify the code
  5. Modify the tests (no excuses!)
  6. Deploy the project (hopefully you’ve automated it and it won’t be a pain in the ass)

What if the non-technical people could directly address this easy problem and solve it by themselves? Let me introduce you to… Superset!

via GIPHY

Superset is an open-source product originally developed by AirBnB that, as stated in their docs:

Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application.

What’s more important, it manages to give tools for both technical and non-tech people to work with it. You can easily build a plot using data from a table in your database, a csv file or a Druid cluster. But in case you need to customize a plot, or to obtain data from a overly-complicated query, you still have the ways to do so, nice!

So… let’s install it?

Disclaimer: I’m going to install locally (Ubuntu 18.04), as a first approach to working with Superset. I do not recommend following the same steps in a production environment!

  1. First you need to install both Docker and docker-compose
  2. Clone the Superset repo
git clone https://github.com/apache/incubator-superset/

3. Navigate to the folder in which the docker files are placed

cd incubator-superset/contrib/docker

4. Run docker-compose specifying file docker-init.sh as entrypoint

docker-compose run --rm superset ./docker-init.sh

This is going to download (the first time it may take a long time) and configure three containers: one with Superset, another one with a Redis instance (to handle cache) and a last one with Postgres (handles metadata). You’ll be asked for an user and password in this process.

5. Launch it!

docker-compose up -d

Now you should be able to access to your Superset service using http://localhost:8088/login/, congrats!

As specified in the docker-compose.yaml file, the containers will be automatically run when restarting docker (or the computer), so you don’t have to worry about rerunning docker-compose when restarting.

Quick Note: If the last step fails for you have to modify file incubator-superset/contrib/docker/docker-compose.yaml and set the environment to production:

SUPERSET_ENV: production
#SUPERSET_ENV: development

This is an already documented problem (Link to Github issue) and hopefully there is a fix coming soon enough.

Ok, that was way too boring, let’s import our first datasource and build some cool graphics!

via GIPHY

The data I’ve used includes information on all property sales in England and Wales that were sold for full market value in 2018 (source). I’ve downloaded the csv and dumped it into a table in Postgres.

First of all, we need to create a connection from Superset to our database, which is done on section Sources > Databases. Once in there, the connector is specified via SQLAlchemy url, which supports a wide range of databases. Here is how I’ve done it with Postgres:

The Superset connector uses SQLAlchemy

Then, it is needed to add each of the tables that we want to use in our visualizations. That action is performed in section Sources > Tables.

Now we have to play with Superset two main objects: dashboards and charts. Each dashboard is composed of several charts that we can arrange as we please. The process of building a basic chart and composing a dashboard is really simple, so I’m not going to explain how to do it, go and learn!

In just about 15 minutes, I’ve managed to build this dashboard:

A simple dashboard displaying some basic graphs

The magic of it is that, as we play with the different filters, the visualizations will be updated without refreshing the page. Don’t believe me? Just watch the following gif, in which first I apply a filter by city and later a temporal one.

Visualizations are re-rendered as soon as a filter is applied

Take into account that each visualizations needs to execute, at least, one query. If your dataset is huge and you change filters very fast, you may block your database (sometimes patience is the word).

Ok, this is very fancy and good-looking, but it has a huge flaw. The chart creator dialog only lets us choose one table, so we can’t join data from different tables… or can we? Yes, we can! (no Obama pun intended) Thanks to SQL Lab, which is a section in which we can query our database, save the results and use them as a visualizations’ data source. For example, if I launch the following query:

SQL Lab query composer dialog

To build a plot based on the results of this query, it is as simple as running the query and then clicking on the Explore button that appears on the left bottom corner. Then, we will be redirect to the visualization creation view, nice! This is the part where Dev has to work, because Biz is not supposed to know anything about SQL syntax (I mean, if they do, you should be happy).

So, at the end of the day:

  • Dev’s workload is decreased.
  • Biz is able to create and modify its own visualizations, as they please.
  • There is no need to read a lot of documentation to start producing results.
  • Dashboard documents are stored in JSON files, so they can be added to your version control system.
  • Visualizations look nice 😍 (even though sometimes are annoying)

Of course, Superset is not perfect and have some flaws:

  • Deployment in production can be a pain in the ass (we used k8s and kinda wanted to kill the developers at some point).
  • The user roles management tab is nearly impossible to understand.
  • You have to stick with the already implemented plots (they have a good PR policy so we can expect more to come soon).
  • You have to update frequently, as they launch releases in a high pace (hmm… maybe this is more of an advantage :D)

That’s all! Thank you for reading, hopefully you’ve been able to create your first dashboard with no difficulties. I’ll leave below a short list of resources in case you need more information/help.

To learn more about Superset:

--

--

Juanlu
urbanData Analytics

Junior Data Engineer @ urbanDataAnalytics. Always eager to learn new technologies and techniques 🚀. #striveforgreatness