Apache Zeppelin at Twitter

Prasad Wagle
2 min readJun 7, 2016

--

At Twitter, we use a variety of analytics engines and front-ends shown in the data pipeline overview below. Apache Zeppelin is a relative newcomer in our data ecosystem that is becoming an important tool for creating, sharing and collaborating on interactive notebooks that can be used as dashboards. You can think of Zeppelin as Google docs for notebooks.

In Zeppelin terminology, a notebook is a collection of notes. Each note contains paragraphs (queries) that are interpreted by interpreters. At Twitter, we have set up one company-wide Zeppelin server on which 600 users have created or viewed 600 notes with 2600 paragraphs (1000 Vertica, 1000 Presto, 200 MySQL, 300 markdown).

Zeppelin is used by product managers, data scientists, sales analysts and engineers. Since it is web-based and works seamlessly with analytics engines, it is very easy to create and share notebooks. Here are typical use cases:

  • Product managers create dashboards with key product metrics
  • Data scientists create notebooks that tell a story with narrative text interspersed with queries, results and visualizations; all in one place.
  • Sales analysts use it to track revenue trends
  • Engineers use it to debug system anomalies

Here’s the work we did before productionizing Zeppelin at Twitter:

  • Security (integration with Twitter single sign-on system, notebook and data source authorization)
  • Stability (reduce websocket communication)
  • Operations (monitoring, standby server, backups)

Here are the projects we are working on:

  • Scalding, Spark and R interpreters
  • Notebook organization for easy discovery
  • Administration (view/stop running jobs, resource usage)

To conclude: Apache Zeppelin is an enterprise ready, easy to use tool for creating and collaborating on interactive notebooks and dashboards. With its extensible architecture and a vibrant Open Source community, it is advancing the state of the art in data analytics and visualization.

Acknowledgments

Thanks to Rohan Ramakrishna, Jason Sprowl, Srikanth Thiagarajan, Rob Malchow, Gera Shegalov, Ruban Monu, Sriram Krishnan, early adopters and users at Twitter who gave us valuable feedback, and the Zeppelin Open Source community.

More Information

--

--