Apache Zeppelin (incubating): 2015 year in review
2015 was a very busy year for Zeppelin project.
A lot of things happened and today, before the year ends, I’ll try to briefly summarize major milestones.
Here is a project timeline, a bit longer version was presented by Moon and me at ApacheCon EU this year in Budapest in our “Data Science Lifecycle with Apache Zeppelin (incubating)” session.
- Dec 2012 — commercial application at NFLabs
- Oct 2013 — internal PoC, based on Hive
- Aug 2014 — initial OSS version \w Spark
- 18 Dec 2014 — Apache Software Foundation incubation proposal
- 23 Dec 2014 — accepted to Incubator
- 31 Jul 2015 — 0.5.0 first release under Apache
- 19 Nov 2015–0.5.5 second release under ASF
Through the history and especially the last year, Zeppelin has come through multiple release cycles lengths ranging from almost a year, to a couple of weeks. Mostly it has been feature-driven, based on the scope of features developers wanted to build and user requested.
One thing that we learned and had a great user feedback on this year was the need for more disciplined time-based releases. Users and especially businesses that rely on open-source software need at least some time-frames to plan their operations on. Likewise other projects i.e Apache Spark has 3 months release scheduled and that is what we, as a community will be trying to archive in the coming year for Zeppelin project as well.
Speaking about the community — it grew a lot. From 5 initial contributes from a single country (South Korea), to a diverse set of more than 80 people around the globe.
200+ engineers from Europe, Asia and US, according to our last report in November participate on the dev@ mailing list. Through the course of the year 3 project members became an official committees and join the Podling Project Management Committee.
All participants worked hard but sometimes struggled adopting the Apache Way. Although a majority of contributors have prior experience developing open-source software aligning the process with the ideas of meritocracy and “community over code” definitely took a while. Despite the difficulties, it was very fruitful mind-set and a consensus-driven approach is something we now strongly consider as the best way of collaboration over any kind of project.
In short, @TheASF and it’s mission “open source software for the public good at no cost” is a very cool place to be part of, with lots of people from all over the world, involved in working together in an open fashion to make software.
Big role in this year’s project success played our honored mentors, Konstantin @c0sin, Henry @Kingwulf, Roman @rhatr, Ted @ted_dunning and Hyunsik @hyunsik_choi.
Many thanks goes to them for time spent helping in promoting the project, answering questions and giving advices.
Adoption: Zeppelin and friends
We at NFLabs have created initial version of Zeppelin out of our own needs and experience in data analytics for the clients. It’s great to see that now, there are more and more people from different companies joining the business community of Zeppelin users and technical community of developers involved into the project.
Zeppelin on the cloud
Engineers from this companies provide to their users Zeppelin almost without any modifications.
- Amazon EMR
- Microsoft Azure
- Google CloudPlatform https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/apache-zeppelin
Zeppelin on the product
Engineers from the companies below have build integration with Zeppelin for their platforms.
Engineers from next companies have build custom products on top of Zeppelin.
Through the year, there we multiple occasions when community members presented about the project in different parts of the world on the major industry conferences:
- Deview, Seoul
- Spark Summit, Amsterdam
- ApacheCon, Austin
- Hadoop Summit, San Jose
- Spark Summit, California
- ApacheCon EU, Budapest
- Flink Forward, Berlin
This year’s major features contributed by the community are the variety of third-party integrations with other tools in BigData ecosystem, which we call interpreters.
Many interpreters were build apart from the already existing Apache Spark ones for backends like Apache Flink, Elasticsearch, Tajo, Phoenix, Kylin, Hive, Cassandra and Geode. The challenge is not only to implement an interpreter (which is quite simple), but also to keep it up to date. With such a moving targets as i.e Apache Flink and Spark it is great to see community enthusiastically keeping up.
There are many more contributions like new pluggable notebook storages backends (S3, Git), ability to sync notebooks seamlessly between them and a few productivity tools like Search\Import\Export notebooks to name but a few.
Zeppelin project has very ambitious roadmap for then next year, including major changes with “Helium proposal” implementation for modularity and extensibility, also a multi-tenant environment use cases, better CI, more high quality releases, graduation from incubator and many more!
As project is still quite young and now it’s a good time to join if you want to make an impact. Patches are always welcome, so do not miss a great opportunity, please stop by zeppelin.incubator.apache.org/community and let us know how you use Zeppelin, what you think is missing, etc.
2015 was definitely a lot of fun for me personally and participation in the ASF was both, the most fun topic of the year and the biggest challenge.
Handling dozens of email threads a day, participating in conversation and code reviews, answering user questions — all on top of the software development activities is something that takes time to master. It’s been a great journey indeed which I hope to extend and continue to then next year and many ever after.
Please stay tuned — more features are coming, 2016 will be even more exciting!
Would like to thank everybody whom I had a pleasure of working on this project together.