Image for post
Image for post

Superset: Scaling Data Access and Visual Insights at Airbnb

Jeff Feng
Jeff Feng
Feb 9, 2017 · 5 min read

By Alanna Scott, Bogdan Kyryliuk, Eli Brumbaugh, Jeff Feng, Max Beauchemin and Vera Liu

Introduction

At Airbnb, one of our fundamental beliefs is that data access should be democratized to empower every employee in order to make data informed decisions. We believe that grounding decisions with quantitative insights from data, together with qualitative insights (e.g. in-person user research) result in the best possible business decisions. This applies to all parts of the organization, whether it is about deciding to launch a new product feature or analyzing how to provide the best possible employee experience.

One of the challenges that accompanies data democracy is enabling data access to users at various levels of data literacy. A number of our users are deeply skilled at writing SQL, particularly those in our Data Science, Engineering and Business Operations teams. The ability to write SQL provides tremendous flexibility in accessing data from our Data Warehouse in Hadoop. However, in our vision to empower every user of data, SQL is often times too high a barrier. Secondly, once users have accessed the data, they are faced with the challenge of exploring and discovering insights. Our solution to both of these challenges is the development of Superset.

Image for post
Image for post

Solution: Superset

Superset is a data exploration and visualization platform designed to be visual, intuitive and interactive. It consists of two primary interfaces:

The combination of these two interfaces enables users to consume data in a variety of ways. Users can directly visualize data from tables stored a variety of databases including Presto, Hive, Impala, Spark SQL, MySQL, Postgres, Oracle, Redshift, and SQL Server. Connectivity with Druid extends the capabilities of Superset to visualize billions of rows of data thanks to its in-memory, column-oriented distributed architecture. With the addition of a SQL IDE, it provides power users with the ability to compose SQL queries to restructure or reduce the size of your data or union data across tables. Additionally, users can immediately visualize their query results using Superset’s Visualize flow.

We first launched Superset’s data exploration interface in March 2016 as a way for users to perform fast and intuitive “slicing and dicing” against any dataset. Since then, we have added a number of major new features including:

Image for post
Image for post
Superset maps visualization leveraging Mapbox

Introducing Superset SQL Lab

Today we’re announcing the introduction of “SQL Lab”, the new SQL IDE in Superset. Integrating SQL Lab into Superset is advantageous because it connects the flow from arbitrary SQL to data visualization, dashboarding and knowledge sharing. Integrating both the SQL IDE and the Data Exploration interfaces together had the additional benefit of managing authentication, roles and permissions in a single tool. SQL Lab as a part of Superset enables us to provide backend query support to all the databases that Superset supports (with the exception of Druid which is not SQL based).

Image for post
Image for post
See SQL Lab in action

SQL Lab packs a number of powerful features including (for the full list, see the Superset documentation):

Computationally intensive, long running queries are common in the “petabyte era” of data, and SQL Lab is designed to provide a nice workflow for this use case. For deployments that have an asynchronous backend available, SQL Lab will automatically default to running queries asynchronously to support large queries. Additionally, with the Create Table As (CTAS) feature, SQL Lab allows users to store query results in a newly created table. With this table, users can then query and visualize data off of the summary table that was just created.

SQL Lab also makes it easy to manage access for any internal database to a set of employees. Administrators can add a new database to Superset using a simple flow while subsequently granting permissions to users through roles. Users can be granted per-database-connection access, as well as per-table access. In the cases where per-table access applies, Superset introspects the query and identifies the table referenced in the SQL.

What’s Next?

Superset had humble beginnings starting out as a hackathon data visualization project, however now it is a full-fledged open source project. Since we launched Superset in March 2016, it has grown into one of the most popular open source data visualization apps, with over 10,000 stars and 100 contributors on GitHub. New features and bug fixes are being added weekly by both Airbnb engineers and community contributors. With the addition of SQL Lab, we are confident that data users will find even more usefulness from the project. We’re excited to see the project grow and improve over time. Some of the features in the near term roadmap include:

Airbnb loves open source, and the Superset team does all of their work in the open. Come join our community on Github! Or if you are really excited about our vision and want to join the team, we’re hiring software engineers to revolutionize the future of data visualization.

Join our community: https://github.com/airbnb/superset

Airbnb Engineering & Data Science

Creative engineers and data scientists building a world…

Jeff Feng

Written by

Jeff Feng

PM Lead for Data @Airbnb covering Machine Learning Infrastructure, Experimentation, Data Visualization & Data Infra. Powered by coffee, bubble tea and burritos.

Airbnb Engineering & Data Science

Creative engineers and data scientists building a world where you can belong anywhere. http://airbnb.io

Jeff Feng

Written by

Jeff Feng

PM Lead for Data @Airbnb covering Machine Learning Infrastructure, Experimentation, Data Visualization & Data Infra. Powered by coffee, bubble tea and burritos.

Airbnb Engineering & Data Science

Creative engineers and data scientists building a world where you can belong anywhere. http://airbnb.io

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store