Apache Zeppelin and GSoC — mentor’s perspective

Seoul Engineer
Apache Zeppelin Stories
5 min readNov 25, 2016

This is the second year for us at Apache Zeppelin with Google’s Summer of Code program, under umbrella of Apache Software Foundation. It may not be a lot, but we hope that by reflecting on our experience other communities, new to the program, can learn how to leverage this program more efficiently.

Apache Zeppelin logo

Apache Zeppelin is a young project. It has started in 2012 as internal product at ZeppelinX as GUI for BigData solutions and got open sourced in 2013.

One challenge of building a diverse, meritocratic community of developers is to attract new talents to the project, and early on we agreed that participating in programs like GSoC and Outreachy where students from all over the world got compensated for working on FLOSS is a great opportunity to do so.

At the beginning we, as any other new OSS project, had a choice:

  1. apply as new, independent organization
  2. become part of bigger OSS foundation (Apache Software, Software Conservancy, Eclipse, etc) and then apply under an umbrella organization

As the sources were open and public from 2013, available under business-friendly Apache 2.0 license (our goal always was to build a community around it) second way seemed more attractive and ASF as leader organization in BigData industry was a natural fit.

As soon as the project matured and more people started using and contributing to it, we proposed to move the codebase under Apache Software Foundation where it became Apache Zeppelin. Umbrella organization approach has additional benefits of most of the formalities being handled for you: things like mentor’s payments and sponsorship for visiting GSoC mentors summit are done through Foundation support.

Everything was set and by 2015 — we have submitted 5 project ideas for GSoC students to participate. Now, fast-forward two years, I want to present a few things that we have learned through committing time and effort to this initiative.

Google Summer of Code logo

Lesson 1

“Less is More” — meaning before you begin, always keep in mind that the goal is NOT to get as many students as possible. In many ways it’s better to have fewer, but motivated and responsible students\proposals\applications.
You have to spend extra time working on EACH proposal and student. So the more you have, the more time you are going to spend helping\mentoring etc, all on top of your regular duties — so less time left for working on the project.

For the first time participation, you really do not want to have more than one student.

Lesson 2

Take the program seriously, PLAN it before it starts:

  • Set the right expectations on your time commitment (for colleagues, employees, etc). There is going to be time overhead, spent solely on this project (communication, evaluation, reviews, etc) not only during the program period, but few weeks before and after it.
  • Put more thinking in proposals upfront: treat in as OKRs.
    It’s very tempting (and indeed we have done this before) to throw a rough new idea, without any prior research, as a proposal for the students. And it turns out, that students, who are not familiar with the project will most likely have a hard times doing both — a research and then integration of this idea to your project.
    Important part of treating the proposals as OKRs is doing some prior research and planning, in order for you as mentor to help defining a bold Objective and measurable, necessary & sufficient KRs. It will help to set the right expectations for both sides and avoid conflicts in the future, in case of unfortunate but possible event when expectations are not met.
  • As soon as proposals are out there, even before community bonding period starts: ADVERTISE them (twitter, mailing lists, etc).
    You are competing with all other orgs for best students and as soon as your project most probably is not yet on level of popularity of NMap, Gnome and others — it may be hard to persuade candidates to join. Looking for universities where graduate students could be interested in using their thesis as part of the work is one strategy that might help
  • Encourage students to do better planning and some upfront research. Best students will do it anyway, some need just bit of directions and doing even a little bit of work with those people should give you much better intuition on whether you want to send the whole summer, working with them side by side.

Lesson 3

Make code contribution a prerequisite for the proposal acceptance

Many projects found this strategy for filtering applicants useful — this helps to make sure that student who applied actually can code and able to participate in standard build\send patch\review process, not wasting first few weeks of the program doing so.

This might seems counterintuitive to filter people like that, especially if it’s your first year and you have i.e just 3 last minute applications. It feels like you have to be less picky, right? But remember, “Less is more” here — so hopefully reducing chances of your productivity drain through the summer and time spent on un-responsive student who takes 2 weeks to “build the project and make familiar with the code” start sounding like an actual win.

This does not need to be a hardcore technical contribution — just a small bugfix or meaningful docs update will do, anything that does not require too much of the upfront effort. Important thing here is to communicate that prerequisite explicitly to all students from start, i.e in proposal templates, wiki, issues, etc.

Wrap-up

This year we had an honor to work with 3 talented students, working on improving variety of aspects at Apache Zeppelin — from better Python interpreter support, to building example Notebooks for public datasets like CommonCrawl and experimenting with distributed P2P Notebook storage mechanisms like IPFS and BitTorrent.

We hope that a few lessons that we learned in a hard way in last 2 years will help you.

And join the community, as GSoC 2017 is coming!

--

--