Getting into GSoC ’20 with CERN-HSF
Or how I learned from my failure and loved the journey.
Introduction
The list of accepted projects for Google Summer of Code 2020 was announced on 4th May at 23:30 IST and after sinking in the delight of seeing my name appear among the the chosen 1,199 projects, I knew that now it was my duty to document my journey, of turning a rejection into a success.
Staring at my name, I remembered the same time, but from last year, when I was disheartened upon receiving a mail stating that none of my projects had been selected. I now personally believe that getting into a program like GSoC does not have a logically defined approach, but rather can be considered as a milestone in the journey towards becoming a better software developer.
If I were to decide the format for describing this journey, I would use a present-past-present format. I believe that analysing our failure with some depth and accepting our reality is the only way we can move towards the direction of achieving our goals. Hence I divided this blog post in 3 parts to capture the effect of storytelling.
I believe that no success story is complete without narrating the failures, but patience is not a virtue of all, so I summarise the three parts below.
- If you wish to know about my failure and learn from my mistakes start reading from PART 1 to get live through my journey.
- If you just want to know how I picked myself and leveled up after failing, continue reading from PART 2.
- And if you only want to know how I came back in the game, stronger than before. and finally got into GSoC 2020, you may directly skip to PART 3, but remember
“Teachings that do not speak of pain have no meaning, for humankind cannot gain anything in return.” — Itachi Uchiha
From here I narrate my journey into the world of open-source and how it lead me towards getting accepted at CERN-HSF for Google Summer of Code 2020. And I would like to begin with the year 2019, when Open Source happened to me.
The year 2019
The year started with me returning from my winter internship at Linux World Informatics Pvt. Ltd. (LW) after completing it successfully. I was now familiar with open-source software and several technologies including Hadoop, Dockers, Ansible, Python, and a Red Hat Certified System Administrator certificate in my arsenal. Now the next step was to apply for a summer internship, and what was better to aim than the renowned Google Summer of Code. So there I began my journey into the world of open-source development and to explore this Summer of Code program.
On a side note, I already applied for the Indian Academy of Sciences’ Summer Research Fellowship Programme 2019 (SRFP). The results were not released yet, but I had pretty much forgotten about this one. We’ll come to this later.
By this time I had a basic understanding of Git and GitHub and I was well versed with the concepts of forking, branching, commits, push, making a pull request, and merging. I would thank Harshit Singh for helping me out in learning all the basics! Being in my second year, I knew JAVA, Python and C along with standard knowledge of data structures and algorithms. I had done some courses on Machine Learning, Data Science, and Image processing. My experience here was backed up by some projects, two of which I completed during my internship at LW.
I knew I had to try for GSoC this year, so I started my research before the organizations were announced. Being a newbie, it was pretty hard for me to find the organizations whose projects matched my skill set. To be honest, CERN-HSF instantly caught my eye there, but after looking at their projects I could hardly understand a word!
They had me in the first half, I’m not gonna lie 😂
I got a bit demotivated with this, but there was still hope that maybe this year one or the other project will be aligned with my interest and skills.
Begin, GSoC 2019.
Google announced the organizations on February 26, 2019. My hunt for projects started. I looked for some particular categories only which included tags like ‘machine learning’, ‘Docker’, ‘Kubernetes’, ‘data analytics’, ‘cloud’, ‘python’, ‘java’, and ‘big data’ etc. since these were some keywords I was familiar with. With just these tags I narrowed down my search drastically to a handful organizations — CloudCV, Tungsten Fabric, Cloud Native Computing Foundation (CNCF), and CERN-HSF (AWAKE, SWAN, and Rucio).
This process took me around two weeks, considering the workload from college academics and other minor club activities. I joined some Slack channels, Gitter chat, and mailing lists.
Getting into a new community can be overwhelming and can affect you with an imposter syndrome. But at the end of the day, remember what you’re there for — to gather experience and exposure. To get out of you comfort zone and hustle your way in!
Reaching out to the mentors
Initially, I could not understand the context of the messages and what people were talking about on the IRC! It is quite normal to get a fear of missing out (FOMO) when you’re new to any community. I mustered up my courage and without giving it any second thought, sent some mails to the mentors from my shortlisted organizations describing —
- My basic introduction including my name, branch, college, and country.
- I wrote about which particular project I was interested in and why?
- Mentioned my achievements (whatsoever) till now and politely asked an opportunity to discuss more upon the project.
I was just able to mail Tungsten Fabric, AWAKE, SWAN, and Rucio (CERN-HSF), out of which there was some internal issue with the Tungsten Fabric’s mentor allotment so I did not receive any reply for a fairly long time. I received fairly quick replies from Mario (Rucio), Spencer (AWAKE), and Diogo (SWAN) and all of them invited me to complete a set of tasks for assessment.
I started with the task right away but some of them seemed difficult to me at first attempt. It took me some days to understand the given tasks. I tried my best to find whatever resources I could find to learn about them and solve the questions, in all my spare time. I focused myself to only these three tasks and dropped all other organizations, devoting all my efforts here.
And here came another challenge — I had never worked in any large scale open source organization, so I did not know the practices they followed. After numerous failed attempts, I successfully completed the tasks from Rucio and SWAN, but was unable to solve all the tasks from AWAKE. Now it was time to create a pull request after making any minor contribution to their repository.
Generally, even a documentation fix counts as a contribution in Open Source Organizations!
I mailed Diogo with the required solution files, but did not get a reply. So SWAN also got eliminated from my list. Now I was left with Rucio only! This was my last chance to get to the proposal stage since I had already invested a lot of time on the tasks itself, there was no option to go back and choose another organization. When I tried making a pull request, I messed up a lot of times. Martin, who was one of the mentors, constantly helped me by correcting my mistakes and also notifying about them. After some 4–5 tries, I finally merged my first pull request in the Rucio official code base!
I would say there is nothing such as “official” or “unofficial” contribution with respect to GSoC. Every pull request you make that gets merged in the code base is credited against your name. That’s it. Your involvement in the community matters the most!
With whatever few weeks left before the deadline, I started a shared Google Doc between my two mentors and me to draft out the proposal for a project titled: Collection Following Mechanism in Rucio. It was a pretty basic proposal with just the essential sections and since this shall fail later it’ll be better if I discuss this in PART 3. 😅
Even after starting so late, both my mentors, Mario and Martin, supported me a lot! 💖 They were there for me at almost every step! After submitting the proposal I forgot about it and became a silent observer for the Rucio’s Slack workspace. On May 6th, 2020 the results were out and I realized what fate had for me. My proposal failed to get accepted for GSoC 2019! 🙁
Lessons learnt from my rejection in 2019
After getting this rejection letter my mind was flooded with thoughts, feelings of dejection and self-loathing, with some confusion to top it off. I conveyed my failure to my mentor Mario, and he humbly appreciated my effort and convinced me that there is no need to get disheartened; my proposal had the necessary things and there are some things beyond your control. He told me, “having your name in the code base of a big organization always helps! 😉” I thanked both my mentors for their effort; but I still needed some answers. I approached Ruturaj Gujar, who got selected that year, congratulated him and politely clarified my doubts with him. He is an awesome guy and he helped me with enthusiasm! Collecting all my thoughts I wrote a note to self on Journey. I shall summarize it points below.
- Make connections with people from companies whose work inspires you. Having a good connection never hurts, but making one takes effort.
- Keep watching for updates on the Slack channel or other IRC you’re part of. It is often really informative and you get to know the community better.
- Make an unbeatable proposal by adding professional looking diagrams, mock-ups, and necessary code snippets. Justify the time you’re going to spend during the summer in your proposal.
- Be open to different options and look for opportunities to learn. Don’t stick to a particular skill set just because you think you know it.
- Keep grinding, nothing can beat persistence; especially when contributing to open source.
The journey from 2019 to 2020
During the application period, the results for IAS SRFP 2019 were out and I was selected to work in the Industrial Engineering and Operations Research (IEOR) department at IIT Bombay! So after my rejection, I prepared myself to start the research work for this summer internship. It was a completely different experience!
In August, I got an idea which would help people buy food at even cheaper rates from food ordering apps like Zomato and Swiggy, by pooling multiple orders together. I teamed up with an another passionate and talented guy, Hemabh Ravee, and together we started learning about full-stack app development. We learned about web-scraping, Dart language, Flutter framework to develop native apps, Flask and Python to write our own APIs, using Firebase as back end, and deploying the app on Heroku and a lot of other things! We brought this idea to life in 36 hours during Hack-a-BIT 2.0 in October 2019.
We called our brainchild — Gromnom: The food pooling app. It was a result of many sleepless nights, frustrating brainstorming sessions, and about a hundred cups of coffee! Taking part in a hackathon gives you a big skill boost if you’re serious about it and aim to pick up something you have never done before, for me it was building a native mobile app. It also gives you plenty of laptop stickers, t-shirts, goodies, and free food! 😁
At that time, participating in an open-source hackathon, learning full-stack app development, and building Gromnom from scratch gave me the right momentum which I needed to push myself into open-source software. And then came Hacktoberfest 2019, at the time when I needed it the most!
The main motivation for me to participate was to explore the world of open-source and also grab an exclusive Hacktoberfest T-Shirt and stickers! During this month-long event we had to make at least 4 pull requests in eligible public repositories on GitHub, so it was a perfect opportunity. I hunted for some repos which had code written in either Python, JAVA, or C. And finally made more than 4 successful pull requests, thus qualifying the challenge!
In January 2020, I joined a Kaggle Competition — Real or Not? NLP with Disaster tweets, in order to fulfill my long term desire of learning Natural Language Processing (NLP). I learned some frameworks like BERT, worked with libraries like NLTK, and trained my model on Google’s Auto ML which was a highlight of the competition. I later teamed up with my uncle, Sampann Nigam, who helped me with improving my score. I finally achieved a score of 0.81288 against a benchmark of 0.81023 using AutoML and 0.83026 using BERT.
Working consistently for an entire year provided me a good momentum. February was almost here, and it was time; and all this grind was gonna be worth it.
Enter 2020
After failing once and rising again, I got the second-wind, it was time for me to give GSoC a second try. Google released the list of mentoring organizations on February 20, 2020. And already being experienced with the procedure once, I directly jumped onto the website and took few hours to go through almost all organizations, and shortlisted their projects which I could contribute in.
Shortlisting Organizations and Projects
The approach I followed here was similar to last year — searching for categories, clicking each organization to see tags, and shortlisting them if tags matched my interest/skill. The project and the organization selection goes hand in hand. Think of it as refining your interest, and finding a niche you want to work in. Take some time, maybe a whole evening, to go through all the organizations and make a spreadsheet or a document to list all the projects that match even 40% with your skill set or interest. With the experience I stated above in Part 1 and Part 2, this will be easier.
To shortlist projects, I generally looked at the requirements for the projects — which language, framework, or technology the organization wants the students to work with.
Now that you have shortlisted the projects, the next step is reaching out the mentors the way I described in PART 1. So I shortlisted some projects, but this time I was open to learn new things which looked challenging at first. Now for that, some organizations mention the difficulty level of the project where I made sure to pick up the ones which said ‘hard/challenging’ or ‘medium/intermediate’. It not only gives you less competition later since a lot of students aim for the low-hanging “easy” fruits, but also a greater amount of experience.
With some analysis, done on projects from past couple of years, I can say that since GSoC aims to ‘introduce’ university students to open-source development, many organizations have projects which mainly require full-stack development (UI+Logic+Database) skills . There are a lot of organizations with tags like — JavaScript, React, Python, C/C++, NodeJS, and any database (SQL/PostgreSQL/MongoDB). Less frequent tags include — Java, Docker, Git, Kubernetes, Redis etc. So if someone has experience with full-stack website/web app development, it gives them some confidence while going through the project list, and a little head start for the proposal
Why CERN-HSF (again)?
After I went through all the organizations, I picked a few for me but I always knew I had to send one proposal to CERN, especially in Rucio. There are a lot of reasons for that — first of all the work environment and ethics were par excellence impressive, and I’m not even overselling here! And second, people at Rucio, especially Mario, helped me grow so much last year, even though I was a naive amateur. He had faith in me, and was with me till the end, and even after that!
So when I messaged him on Slack after a year, he welcome me with the same warmth and enthusiasm! There were two projects listed there, one related to NLP and the other one related to Desktop App Development. I had recently gained experience in NLP through Kaggle and personal projects, I wished to work with the first idea but finally proceeded with the latter because it was under his direct mentorship.
Drafting a (better) Proposal
Being associated with Rucio since 2019, gave me a head start in drafting the proposal. After having a brief discussion with Mario, I sent a mail to all mentors notifying them about which project I wish to work on. I started a proposal doc just the next day. Maybe this is why every generic blog on GSoC says to “start early”.
“In a long-distance relationship, communication is the key”
I started searching the web to learn how to build desktop apps. I had long discussions with my another mentor, Thomas, who helped me with a lot of technical details regarding their software. After lots of research to find what’s best suited for our use case and eliminating the options, we finally settled to build an Electron app for Rucio, with React framework to develop the front end. There were very frequent discussions between me and my mentors, I even called up one of my mentors, Gabriele, one evening to discuss upon a task in detail.
I started writing my proposal and was getting more involved in the community. In early March, I attended their annual workshop remotely. even if I wasn’t able to understand all the topics from the talks/presentations, I stayed till the end of it for all three days, trying to squeeze any relevant information I could get a hold of. Getting involved in the community through these events is a very good opportunity to gather and show interest in the organization.
I was proposing to develop a graphical user interface for a command line interface (CLI) based software, so presenting some mock-ups was a mandatory practice. Here my designing skills came into play. After my initial research on how people prefer to use Rucio, I drafted some mock-ups on Adobe XD and sent them for review. After several iterations all of us settled for a design. I even went a little overboard to define Design Guidelines for Rucio, which can be referenced by developers for future contributions to the project if it came to completion.
“A picture is worth a thousand words”. Remember this line always while drafting your proposal.
Therefore it is always good to include some UML diagrams to show the flow of the system you aim to develop. Throw in some code from their code base which you wish to change, add some pseudo-code for any algorithm you wish to implement, necessary diagrams to explain the system design etc. This not only makes your proposal look beautiful but adds a touch of professionalism to it, which is highly appreciated!
The project required me to work mainly on improving the FUSE Filesystem for Rucio, and developing a native desktop app using Electron and React JS, along with Redis and Python. Even though I did not know most of the technologies in required depth, I spent days and nights to learn them and improved the proposal iteratively with all the suggestions and inputs from all four of my mentors! It finally turned out to be about a 16-page proposal, as compared to my 6-page one from last year.
By the time I submitted the proposal on 31st March, I’ll say I was already hooked! I got so involved in that community that I couldn’t just ghost it to wait for the results.
So I kept in touch, kept learning the skills required for my project, and kept contributing to their repository. I did not have the slightest idea that my project will be selected or not, but still I invested my time into the process because I loved doing that. And everything paid off!
Lessons from my selection in 2020
- Keep working, because you want to improve; not just to get selected. GSoC is not the only aim, your passion to be a better developer and a real problem solver will take you places.
- Don’t hesitate to experiment and learn new challenging things. Have a growth mindset, and never hesitate to ask the right questions. Don’t judge the level of your question/doubt yourself, chances are you’ll underestimate it.
- It’s the connections you make along the way which make all the difference. Everything has something to offer which you can learn, never leave an opportunity to connect with people and build a good network.
- Journey is everything! Document it for the world as your survival guide. Something which is obvious to you, might be a valuable advice to someone else.
If you read though the whole post, you would have realized by now that getting into a program like the Google Summer of Code is nothing close to a “mainstream” internship, where you clear the written/coding round, technical interview, and personal interview etc and you’ll have an offer letter. There is no preparation strategy or technical eligibility requirements to contribute to open-source. It’s more of a party where you introduce yourself to a bunch of strangers who later become a memorable part of your life.
“ Nothing in the world can take place of persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination alone are omnipotent. “ — The Founder (2016)
I tried my best to answer most of the questions people asked me on LinkedIn or other platforms. If you wish to know about what GSoC is, what it offers, and benefits etc. there are numerous generic blogs which describe them in great detail by compiling all the information available across the internet. I thought it would be better to narrate an original journey rather than just spamming existing information here. There still are a lot of myths about GSoC, and people think of it as something unapproachable. Although I believe it is something which can be achieved by anyone who is desperate enough to add this feather to their cap by following discipline, consistency, and perseverance.
Thanks a TON! for making it this far in this blog. I wish this blog justifies your time, and my journey and experience helps you any way possible! I’m always up for healthy discussions, so ping me up if you have any ideas or thoughts you would like share related to anything!
Until next time!