Superset: Airbnb’s data exploration platform

By Maxime Beauchemin


At Airbnb, we love data, and we like to think that analytics belongs everywhere. For us to be data-driven, we need data to be fluid, fast flowing, and crystal clear.

As a vector for data exploration, discovery, and collaborative analytics, we have built and are now open sourcing, a data exploration and dashboarding platform named Superset. Superset allows data exploration through rich visualizations while performing fast and intuitive “slicing and dicing” against just about any dataset.

Data explorers can easily travel through multi-dimensional datasets while creating and sharing “slices”, and assemble them in interactive dashboards.

Data exploration at the speed of thought

“Data visualization is effective because it shifts the balance between perception and cognition to take fuller advantage of the brain’s abilities.” — Stephen Few

It takes very little time, maybe 10 to 30 seconds of delays, to break someone’s cognitive flow. Superset keeps your thinking loop spinning by providing a fluid query interface and enforces fast query times. Slicing, dicing, drilling down, and pivoting across visualizations allow users to explore multi-dimensional data spaces effectively.

The codeless approach to data navigation allows everyone on board, democratizing access to data. On one side of the spectrum, users that are less technical find an easy interface to query data. On the other end of that spectrum, advanced users enjoy gaining velocity and the ease of sharing the content they create.

Data scientists, engineers and other data wizards can still use Tableau, R, Jupyter, Airpal, Excel, and other means to interact with data, but Superset is gaining mind share internally as a frictionless and intuitive vehicle for sharing data and ideas.

Features

  • A rich and extensible set of visualizations including basic charts as well as sunburst, parallel coordinates, heatmap, force directed layouts, world map, pivot table, word cloud, Sankey diagram, and more!
  • Create and share interactive dashboards as collections of visualizations
  • Flexible authentication and authorization, with support for LDAP, OpenID, OAuth, Remote User, and more. Granular permissions and role management allow administrators to define very clearly who gets access to which feature and/or which dataset
  • A thin semantic layer that defines how datasets should be exposed, and allowing to enrich the content by adding SQL expressions and metrics
  • Connectivity to most SQL-speaking databases, as well as support for querying Druid.io for fast realtime analytics
  • A smooth learning curve: users can be trained in minutes and get value instantaneously
  • Flexible data caching, with cascading timeout parameters by report, table and database to relieve your databases from heavy load and to make important dashboards load quickly
  • Customizable and hackable! You can brand and skin Superset with your own bootstrap theme, create CSS templates for your dashboards and modify the controls for specific visualizations

Connectivity

Superset should work just as well in your environment as it does in ours. The query layer was written using SQLAlchemy, a SQL toolkit that allows authoring queries that can be translated to most SQL dialects out there.

Beyond the SQL world, Superset is designed to harness the power of Druid.io. Druid is an open source, fast, column-oriented, realtime, distributed data store. Coupling the two together accelerates analysis cycles by taking delays out of the equation.

A thin semantic layer

Superset allows you to manage a thin layer to enrich your datasets’ metadata. This simple layer defines how your dataset is exposed to the user and is composed of:

  • Descriptions, definitions, and verbose names for your dimensions and metrics that provide context while exploring datasets
  • Calculated fields and metrics. For instance, ratios, distinct counts, and anything else that can be expressed through SQL
  • Simple parameters that define how fields are exposed in the UI

Test Drive

We’ve made taking Superset for a test drive very easy. After the simple installation process, you’ll get Superset loaded with a nice set of dashboards, charts, and datasets that you can explore and interact with. The next logical step is to connect to your local databases and start visualizing them.

Screenshots

A bright future

Superset started as a hackathon project less than a year ago. While the project is already solid, it’s still young and gaining momentum. Look forward to more interactivity in dashboards, support for a growing number of visualizations, a set of training videos, more social features like tags, comments, usage information, chart annotations, and much more!

We’re planning on releasing the data visualizations and controls exposed in Superset as reusable React components. This modular approach will make these building blocks available to application developers. At Airbnb, we have many use cases for rich and interactive visualizations as part of of internal applications; for example, our A/B testing framework, anomaly detection framework and user session explorer. It would be great to share the same components across all of these applications.

Join the community and find pointers to resources on Superset’s Github repository!

Note: Superset was originally released with the name Caravel.

Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData