Highlights from the 2019 SF WiMLDS scikit-learn Open-Source Sprint

The first ever Bay Area WiMLDS scikit-learn Open-Source Sprint took place this past Saturday, November 2, 2019, at the UCSF Mission Bay Campus. scikit-learn is one of the most popular open-source machine learning packages available and, like most open-source libraries, it is predominantly written by men… …men who incidentally work really hard to democratize software, and would love to involve a diverse team of contributors, but a pretty homogeneously male group nonetheless. WiMLDS (Women in Machine Learning and Data Science) is an organization, which aims to change these dreary gender-imbalanced statistics. Its mission is to “support and promote women and gender minorities who are practicing, studying or are interested in the fields of machine learning and data science”. The purpose of the Bay Area scikit-learn sprint was to increase the participation of women and gender minorities in the scikit-learn and python open-source ecosystems.

WiMLDS scikit-learn sprints have previously been organized in NYC (three times, no less) and Nairobi. The Bay Area, always lagging woefully behind in technological developments, was host to the event for the very first time this year. An entrenched Bay Area resident, my last visit to New York was 8 years ago (not long after the very first public release of scikit-learn) and I have regrettably never visited Nairobi, so this was also my very first WiMLDS scikit-learn sprint. I will share some of my observations here, and then let some of the excellent humans that I met at the sprint speak for themselves.

The organizers and volunteers showed up bright and early at 8:30 am, an hour before start time. I was lucky (my graduate student budget thanks me) to share a ride with some of them from Berkeley at 7:40 am. Everyone was cheerful and fully awake at this ungodly hour. It involved absolutely no one asking themselves “why did we sign up for this again?” (Jokes aside, the organizers and participants alike benefitted immensely not least on a personal level from this incredibly valuable event: Just take a look at their own stories, below.)

The fearless organizing and volunteer team consisted of (from the left):

  • Valentina Borghesani (@vborghesani, see Valentina’s awesome blog about the sprint here.)
  • Areez Malik
  • Tom Dupre La Tour
  • Reshama Shaikh (@reshamas)
  • Andreas Mueller
  • Xihe (Jeff) Xie
  • Michael Eickenberg
  • Pablo Damasceno
  • Cheng Wang (not pictured, but a *crucial* contributor to the sprint)

Fortunately, breakfast arrived shortly after us…

Photo Credit: Valentina Borghesani.
Where there is breakfast, there will be hackers. I made several new friends before 9:30 am. Not bad, for a Saturday morning. Photo credit: Valentina Borghesani.

…as did a set of cool stickers from the sponsors who, along with the organizers and volunteers, made this event possible.

Event sponsors

@UCSF

@UCSFimaging

@neo4j

@Microsoft & @MSFTReactor

@seanmylaw & @TDAmeritrade

@OReillyMedia

The lead organizer, Reshama Shaikh, kicked off the event around 9:30 am with thanking our host, UCSF, and the sponsors. She reminded us that this is a “women’s space”. This does not mean that it is exclusively intended for women: Indeed, a number of men participated in the sprint. What it does mean, Reshama explained, is that the sprint will be a harassment-free experience for all attendants. She further emphasized that part of this being a “women’s space” is that we let others speak. She specifically suggested that, if you know yourself to be a person who tends to speak a lot in public, a good rule of thumb is to let two other people speak before you speak a second time. (How I wish these things would go without saying… …or be explicitly stated when needed… …in all social spaces, not just those that are specifically branded as “women’s spaces”. But here we are.) The event was free to attend, but it was recommended that attendants donate a nominal amount to NumFOCUS to support open-source development (and you can too).

Next, one of the core scikit-learn developers, Andreas Mueller, gave an introduction to setting up and starting to contribute to scikit-learn (and you can too!). He recommended that we try out pair programming which, as its name implies, entails two people programming together. An interesting twist on this concept that I wasn’t aware of before was the specific recommendation to have the *less experienced* person drive (well, code). In other words, the less experienced programmer should be writing the code, and submitting the pull requests, while both participants discuss the problem and decide what to do next.

And off we went… Before long, the PRs (pull requests) were trickling in by the dozens, while the teaching assistants were kept busy helping participants overcome roadblocks, and reviewing submitted PRs. I found that one of the most challenging things was selecting an issue that was both open (and actually up for grabs rather than just officially open), and tractable. One of the great things about being at a sprint like this was that there were several people available to help with technical glitches and, with core scikit-learn developers present at the event, we could get very rapid feedback on whether our PRs could be incorporated.

Morning sprinting.
Morning sprinting.
Ah yes, there was also lunch.
Afternoon sprinting.
Xihe, Areez, and Pablo, troubleshooting an undoubtedly very important bug.

Many attendants courageously kept coding until late in the afternoon, and all of us had our own particular favorite moments. Here is what some of the participants had to say.

scikit-learn hackers of the WiMLDS sprint 2019

“I just love to see women come out and contribute to open source, because this is something that I do full-time, and there are so few of us, that it gets a bit lonely. So, it’s really encouraging to see so many women show up and contribute to open-source.” (Erin LeDell, @ledell)

Superwomen pair-programming.

“It was making an open-source contribution for the first time. I have never done this, so it’s been good: Now I know what the workflow is like.” (Poorna)

“My favorite thing was learning a lot about the source code. I have to read it carefully to understand what is happening within the code, so that I can edit the docstring. The learning experience was the best part. Contributing to open-source is another important goal for me.” (Hailey)

“I love connecting with people, and converging on similar interests and goals. The people is really the main thing for me. I also love contributing to open-source.” (Paula)

“My favorite thing about the sprint is that I discovered something weird: This. I discovered that you can compare an “else” statement with a “for” statement in python. Unlike in javascript. I’m starting to be more OK with this as I tell more people about it. At first I was offended. Now it’s acceptance.” (Anisha)

“My favorite thing about the sprint is the excitement people feel when they have submitted their first PR.” (Reshama, @reshamas)

Siblings pair-programming.

“I think it’s really awesome that there are TAs to help the process and the workflow behind doing open-source. It’s really helpful for getting started and to understand how to contribute to open-source.” (Regan)

“That’s a big one for me also: Understanding how to integrate into the community as a new participant, without being cumbersome and creating more problems than solutions. Somebody stopped by and showed me a piece of code logic I hadn’t seen before. Interacting with others gives you opportunities to continue to grow your technical skill set.” (Will)

“A lot of things [were my favorite]. Firstly, I won a book. I have never contributed to open-source code before. I have done other things, like contributed to stack overflow, but not to an open-source codebase.” (Shruthi)

“I would echo what Shruthi said. I had never contributed to any sort of open-source project. It was something I had always wanted to do. The overhead for getting started had always seemed insurmountable. Having Andreas and the other volunteers here, really helped me overcome that initial learning curve. If I come to a sticky point, I can move past it. It gave me that little push I needed to get into the open-source community.” (Kevin)

“Getting started on a PR [was my favorite part]: the process of searching for the issue that I want to work on, forking the repo, pulling it down, and going from not knowing what I’m doing to figuring it out: that whole process.” (Tiffany)

“The number of people for which this is the first time… They never felt that they would make it to a sprint, that they would fit in, but they are here, they are doing it, they are pushing their code. That’s definitely the best thing.” (Valentina, @vborghesani)

Laura and Tom hard at work.

“If you’re working on things on your own and you run into an issue, the best you can do is to post it on a forum, and people might yell at you, or be nice to you, but it takes time. The amount of back-and-forth that we could have here today would have taken days in an online forum.” (Laura)

“What was my favorite thing? The brownies! Just kidding. I think the best thing was the help. I felt like I had a lot of stupid questions, but nobody treated me that way.” (Rebecca)

“Everybody was super-engaged. We got a lot of pull requests and it was really great. That was my favorite thing: Everybody being engaged and contributing.” (Andreas)

Pair courage.

“I love pair-programming, so it was great to get to work with someone all day.” (Sallie)

“I feel the same way, because I haven’t pair-programmed in a while. It makes it less intimidating to walk through the source code when you’re not alone.” (Fanny)

“It’s like pair courage.”

--

--

The blog for the Bay Area chapter of Women in Machine Learning and Data Science https://meetup.com/Bay-Area-Women-in-Machine-Learning-and-Data-Science/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store