Data Science Internship at Gojek
Being an undergraduate student at Hong Kong University of Science & Technology, which academic year runs from September to May, means that I’ve got a full 3-month summer holiday. So, I decided to do a summer internship. I applied to Gojek, and after a couple of interviews and a case study, I got accepted as a Data Science Intern.
From the start, with the cheerful yet meticulously crafted welcome email, I knew that this internship would be a memorable journey. On the first day at Gojek, us interns were welcomed by the GoAcademy team for the onboarding process. Other than our onboarding process, the GoAcademy team was also responsible for our personal development during the whole internship period. Several activities, including workshops and LAP (Learning Action Plan), were organized by them for this purpose.
We finally got to meet Gojek’s Data Science team in the afternoon. The team in Jakarta is divided into three streams: Supply, Fraud, and Logistics. After the team introduction, we were each assigned to a mentor. Our project during the internship would correspond to our mentor’s stream.
I and a fellow intern, Mahendri Dwicahyo, would be working in the supply stream along with our mentors Gibran Erlangga and Iqbal Tawakal. Led by Ridlo Nur Rahman, DS Supply stream’s objective is to optimize Gojek’s supply spending.
During the first few weeks of internship, Mahendri and I familiarized ourselves with the whole project infrastructure. Meanwhile, we were also tasked with writing unit tests using Python, and analyzing the model performance.
Things got busier around July as the production date for our model was approaching. I got the chance to make the guardrails to filter out radical changes the model might produce. I was also involved in making sure the data pipeline works seamlessly. I learned a lot about this supply project, from collecting the data from BigQuery, training them using regressions to produce models, choosing the best combination using simulated annealing optimization with the guardrails integrated into it, until uploading the JSON result to Google Cloud Storage.
Nearing the end of my internship, I worked with everyone in the supply team to make a Bayesian hierarchical model for a new supply function. By building this model, I learned a good deal about feature engineering and model tuning.
In addition to our involvement with the main project, Mahendri and I were also given a task to make an A/B Testing library. We successfully made the library, utilizing both the Bayesian and frequentist methods. This library covered the Gaussian, Poisson, and Binomial distributions and was also thoroughly documented.
I am really grateful to have had the chance to be a part of Gojek’s Data Science Intern in the summer of 2019. Through this experience, I’ve gained a lot of knowledge, skills, and also saw firsthand how my project made a substantial impact on people’s lives around me.