Google Summer of Code 2021 : A Probabilistic Perspective
Google Summer of Code opened the door to open source projects for me. Now, with the aim of openness, I want to write this Medium post so that my experiences and thoughts are also open to you, in addition to my code.
What is Google Summer of Code?
You can check a lot of websites, especially https://summerofcode.withgoogle.com/ , to find all the formal details about Google Summer of Code. Here I focus on the perspective of a fourth-year female computer engineering student in Turkey.
It is often said that Google Summer of Code makes you familiar with the world of open source projects, gets you acquainted with new frameworks, libraries, and tools, strengthens your CV, etc. All true. But if I just said this much to explain Google Summer of Code, I would do an injustice. In reality, for me, I would put it like this:
Google Summer of Code is where you find the perfect repository, which is a place in the sun considering your skills and your interests, where you find the idols who are your mentors enlightening your goals wholeheartedly, and also where you find a chance to work with gold team members you hold dear.
Want to hear more? Take your seat belt. We are about to dive deep into my adventure. You will probably enjoy reading it. Wait and see.
Applying
There are lots of posts about how to prepare your proposal for Google Summer of Code. However, I had to prepare mine on the last day, because the workloads of my courses have been enormously large. I had a midterm the day of my application. I examined all organizations within the Google Summer of Code. After examining all the projects, there was one that stood out above the rest: to create python and JAX examples for the pyprobml repo, which hosts the code used by the new textbook Probabilistic Machine Learning by Kevin Murphy at Google Research.
First Contributions
I thought I knew what hard work was before GSOC, but even though I have had a lot of school projects and exams, I have spent the most time with the issues in Pyprobml. To be more precise about my effort before Google Summer of Code, these are the pre-GSoC PRs I have made.
In short, each PR is a rung of the ladder in the acceptance stage.
Interview 1
After applying, I got invited to a screening interview with Mahmoud Soliman, one of the GSoC mentors. It was early in the morning and I had taken a quiz just before my interview, after sleeping for only 5 hours. I would have been completely helpless, but Mahmoud always encouraged me, not only during the interview but also the whole summer.
Interview 2
After passing the Mahmoud screen, I had a followup interview with Kevin Murphy. I was lost for words in front of Kevin Murphy during my second interview since he is such a gifted researcher. But he always gives importance to any thought and opinion, no matter who you are. You always feel like your ideas are valuable due to his questions.
After the second interview, I waited impatiently for the news…
Outcome
Let X and Y be events such that
I’m floating on air and have been accepted by Google Summer of Code! You feel my excitement and happiness up to now. So, one can infer that p( X=1 | Y=1 ) is very high.
This email was like a bolt out of the blue.
I like stochastic life!
Flagship Project
I tackled 22 issues over the summer, ranging from making simple python scripts to plot things, to complex ML algorithms. The highlight was my implementation of the Hidden Markov Model.
First I implemented the methods in Numpy, for ease of debugging. Then I rewrote them in Jax, for speed. One issue I encountered is numerical instability when dealing with either too small or large numbers. I handled this problem by clipping the values or working in log space. I then extended the code so that the observation distribution is not limited to categorical distribution. For this, I used the distrax library, which is a jax version of a subset of TensorFlow Probability (TFP) . We ended up with a pure JAX implementation of HMMs that provided a more readable alternative to the TF source code. Kevin Murphy asked the Distrax team at Deepmind if they wanted to import it into their repo, and they said yes! So I ended up contributing to two open source projects this summer.
Summary of Contributions
These are all my contributions this summer:
/probml/probml-notebooks
/probml/pyprobml
- Variational EM
- Quadratic Lower Bounds
- Sensor Fusion Unknown Precision
- Neal’s Funnel
- ICA Demo
- ICA Demo Uniform
- Newcomb’s Plugin Demo
- PageRank Demo
- Hierarchical Linear Regression
- Prostate Comparison Demo
- Noisy Spelling HMM Example
- JAX Version of HMM Library
- HMM General Library
- HMMs Working in Log Space
- HMM Lillypad Demo
- distrax.MixtureSameFamily with Posterior Marginal and Posterior Mode Methods
- Reimplementation of Viterbi Algorithm
- MixtureSameFamily for Multivariate Bernoulli Case
- MixtureSameFamily for Gaussian Case
- PixelCNN and Codebook Sampling
Special thanks to Kevin Murphy and Mahmoud Soliman for their support, suggestions and feedbacks throughout the summer.
Last but not least, I really appreciate Kevin Murphy’s help to prepare this post.