Apache Zeppelin (incubating): 2015 year in review

4 min readDec 23, 2015

2015 was a very busy year for Zeppelin project.
A lot of things happened and today, before the year ends, I’ll try to briefly summarize major milestones.

History

Here is a project timeline, a bit longer version was presented by Moon and me at ApacheCon EU this year in Budapest in our “Data Science Lifecycle with Apache Zeppelin (incubating)” session.

Dec 2012 — commercial application at NFLabs
Oct 2013 — internal PoC, based on Hive
Aug 2014 — initial OSS version \w Spark
18 Dec 2014 — Apache Software Foundation incubation proposal
23 Dec 2014 — accepted to Incubator
31 Jul 2015 — 0.5.0 first release under Apache
19 Nov 2015–0.5.5 second release under ASF

Releases

Through the history and especially the last year, Zeppelin has come through multiple release cycles lengths ranging from almost a year, to a couple of weeks. Mostly it has been feature-driven, based on the scope of features developers wanted to build and user requested.

One thing that we learned and had a great user feedback on this year was the need for more disciplined time-based releases. Users and especially businesses that rely on open-source software need at least some time-frames to plan their operations on. Likewise other projects i.e Apache Spark has 3 months release scheduled and that is what we, as a community will be trying to archive in the coming year for Zeppelin project as well.

Community

Speaking about the community — it grew a lot. From 5 initial contributes from a single country (South Korea), to a diverse set of more than 80 people around the globe.

200+ engineers from Europe, Asia and US, according to our last report in November participate on the dev@ mailing list. Through the course of the year 3 project members became an official committees and join the Podling Project Management Committee.

All participants worked hard but sometimes struggled adopting the Apache Way. Although a majority of contributors have prior experience developing open-source software aligning the process with the ideas of meritocracy and “community over code” definitely took a while. Despite the difficulties, it was very fruitful mind-set and a consensus-driven approach is something we now strongly consider as the best way of collaboration over any kind of project.

In short, @TheASF and it’s mission “open source software for the public good at no cost” is a very cool place to be part of, with lots of people from all over the world, involved in working together in an open fashion to make software.

Big role in this year’s project success played our honored mentors, Konstantin @c0sin, Henry @Kingwulf, Roman @rhatr, Ted @ted_dunning and Hyunsik @hyunsik_choi.
Many thanks goes to them for time spent helping in promoting the project, answering questions and giving advices.

Adoption: Zeppelin and friends

We at NFLabs have created initial version of Zeppelin out of our own needs and experience in data analytics for the clients. It’s great to see that now, there are more and more people from different companies joining the business community of Zeppelin users and technical community of developers involved into the project.

Zeppelin on the cloud

Engineers from this companies provide to their users Zeppelin almost without any modifications.

Amazon EMR
https://aws.amazon.com/blogs/aws/amazon-emr-update-apache-spark-1-5-2-ganglia-presto-zeppelin-and-oozie/
Microsoft Azure
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-zeppelin-notebook-jupyter-spark-sql/
Google CloudPlatform https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/apache-zeppelin

Zeppelin on the product

Engineers from the companies below have build integration with Zeppelin for their platforms.

Hortonworks
http://hortonworks.com/blog/introduction-to-data-science-with-apache-spark/
Sequenceiq
https://github.com/sequenceiq/zeppelin-stack
Juju
https://jujucharms.com/apache-hadoop-spark-zeppelin

Deep integration

Engineers from next companies have build custom products on top of Zeppelin.

Conferences

Through the year, there we multiple occasions when community members presented about the project in different parts of the world on the major industry conferences:

Deview, Seoul
Spark Summit, Amsterdam
ApacheCon, Austin
Hadoop Summit, San Jose
Spark Summit, California
ApacheCon EU, Budapest
Flink Forward, Berlin

More detailed list of conferences and meetups, with links to slides and videos you can find on the community wiki page and please, let us know @ApacheZeppelin if something is missing!

New Features

This year’s major features contributed by the community are the variety of third-party integrations with other tools in BigData ecosystem, which we call interpreters.

Many interpreters were build apart from the already existing Apache Spark ones for backends like Apache Flink, Elasticsearch, Tajo, Phoenix, Kylin, Hive, Cassandra and Geode. The challenge is not only to implement an interpreter (which is quite simple), but also to keep it up to date. With such a moving targets as i.e Apache Flink and Spark it is great to see community enthusiastically keeping up.

There are many more contributions like new pluggable notebook storages backends (S3, Git), ability to sync notebooks seamlessly between them and a few productivity tools like Search\Import\Export notebooks to name but a few.

Roadmap

Zeppelin project has very ambitious roadmap for then next year, including major changes with “Helium proposal” implementation for modularity and extensibility, also a multi-tenant environment use cases, better CI, more high quality releases, graduation from incubator and many more!

As project is still quite young and now it’s a good time to join if you want to make an impact. Patches are always welcome, so do not miss a great opportunity, please stop by zeppelin.incubator.apache.org/community and let us know how you use Zeppelin, what you think is missing, etc.

2015 was definitely a lot of fun for me personally and participation in the ASF was both, the most fun topic of the year and the biggest challenge.

Handling dozens of email threads a day, participating in conversation and code reviews, answering user questions — all on top of the software development activities is something that takes time to master. It’s been a great journey indeed which I hope to extend and continue to then next year and many ever after.

Please stay tuned — more features are coming, 2016 will be even more exciting!

Would like to thank everybody whom I had a pleasure of working on this project together.

Alexander, engineer @NFLabs, member of @ApacheZeppelin