Airbnb is pleased to announce the launch of Airpal, a web-based query execution tool that leverages Facebook’s PrestoDB to facilitate data analysis.
People who spend time using SQL for exploration and investigation know that the workflow is not always smooth. Remembering how a query was written, copying and pasting from the command line, and running multiple terminal windows can slow down analysis and be frustrating. Additionally, when diverse teams are using SQL for analytics, the learning curve can be steep for beginners, so good UI tools can help drive adoption and promote knowledge sharing.
At Airbnb, we launched Airpal internally about a year ago and now more than 1/3 of all employees have issued a query through the tool. This is an astounding statistic and it shows how integral Presto is to our company’s data infrastructure.
We currently hold about one and a half petabytes of data as Hive managed tables in HDFS, and the relatively small data size of our important “core_data” tables allows us to use Presto as the default query engine for analysis. When running ad hoc queries and iterating on the steps of an analysis, Presto is much snappier and more responsive than traditional map reduce jobs. The biggest benefit to adding Presto to our infrastructure stack, though, is that we don’t have to add additional complexity to allow “interactive” querying. Because we are querying against our one, central Hive warehouse, we can keep a “single source of truth” with no large scale copies to a separate storage/query layer. Additionally, the fact that we don’t need change data storage type from RC format to see the speed improvements, makes Presto a great choice for our infrastructure.
We are excited to share this tool with the open source community and we hope that it can provide similar utility for others.
Key features of Airpal
- optional access controls for users
- ability to search and find tables
- see metadata, partitions, schemas, and sample rows
- write queries in an easy-to-read editor
- submit queries through a web interface
- track query progress
- get the results back through the browser as a CSV
- create new Hive table based on the results of a query
- save queries once written
- searchable history of all queries run within the tool
Keeping with the spirit of Presto, we have tried to make it simple to install Airpal by providing a local storage option for people who would like to test it out without any overhead or cost. For more detailed information, visit the GitHub page here: https://github.com/airbnb/airpal
A few of the notable technology features in Airpal are:
- uses Dropwizard as an easy way to provide a REST service in Java
- employs SSE (Server Sent Events) to push messages from the server to the client
Finally, we would be remiss if we did not mention the awesome direction that Facebook provided as the original developers of Hive and the pioneers of building UI tools to facilitate easy access to big data. We stood on the shoulders of giants to make this tool and we appreciate the influence and input that the data infrastructure and data tools teams at Facebook were able to provide.
If you’re interested in helping build a world class suite of data tools, check out this open job position: https://www.airbnb.com/jobs/departments/position/48112
Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData
Originally published at nerds.airbnb.com on March 5, 2015.