Turn Data into More Meaningful Insight at Data Science Academy Camp 2

Published in

COMPFEST

9 min readSep 4, 2021

COMPFEST 13, Jakarta 一 Camp 2 Data Science Academy has been held online via Zoom from 23 August 2021 to 29 August 2021. The participants of Camp 2 Data Science Academy are the 10 best teams that have successfully passed the COMPFEST selection. This camp was attended by many experienced speakers and experts in their fields. Are you curious about the excitement of Camp 2 Data Science Academy? Let’s see the excitement of the activities at Camp 2!

Day 1 一 Modelling Overview

The material presented on the first day of Camp 2 Data Science Academy was themed Modeling Overview. The first material was presented by Elisafina Siswanto, Senior Data Scientist at Tiket.com.

Before starting the first session, Fina introduced Tiket.com also the background and purpose of establishing Tiket.com. After that, Fina mentioned the sub-materials at the first meeting, namely Machine Learning Overview and Machine Learning Framework.

First of all, Fina provides an overview of Machine Learning which includes a brief definition of Machine Learning, the difference between humans and Machine Learning, the reasons for the need for Machine Learning, and its implementation. Fina explained that the Machine Learning model could solve problems more efficiently and provide flexible solutions in adapting. These problems are not easily solved if studied by humans conventionally.

Next, Fina mentions and explains each point of the Machine Learning framework. According to Fina, the first thing that we must do in building a Machine Learning model is to determine the goals and measurement of the success of the Machine Learning that will be made. Without determining the success measure of Machine Learning, of course, we will not know whether the ML has been successful. After that, Fina explained the types of Machine Learning along with two modeling processes, namely training and predicting. Finally, Fina explained various types of Machine Learning model evaluations and continued with a Q&A session which DSA participants enthusiastically welcomed.

It didn’t stop there. The activity continued with a Hands-On session. In this session, Fina practiced the material that had been presented directly.

Day 2 一 A Classical Learning: Supervised Learning

The material for the second day of Camp 2 Data Science Academy was presented by Yoga Pratama Aliarham, the Lead Data Scientist at Tiket.com. Yoga brought material about Supervised Learning. Yoga explained that although Supervised Learning is included in Classical Learning, Supervised Learning is still widely used in implementing Machine Learning.

Before starting the Supervised Learning material, Yoga review the first day’s material, the definition of Machine Learning, and its differences from traditional programming. According to Yoga, the difference between Supervised Learning and Unsupervised Learning is only in data labels. Supervised Learning requires data labels, in contrast to Unsupervised Learning, which does not require data labels. Next, Yoga likened Supervised Learning to function mapping and then asked participants to answer the sample questions given.

After the participants answered Yoga’s questions, Yoga explained the two Supervised Learning methods, namely Regression and Classification. Regression is Supervised Learning that is used to predict output with new data input after training using the data owned. Furthermore, there is a classification method for categorizing existing objects based on the labels given. In addition, Yoga also explained the advantages and disadvantages of each method and its implementation in daily life. After Yoga delivered the material, He closed the material with a summary followed by a Q&A session.

Same as the first day, there was a Hands-On session to give participants a better understanding by practicing the material that had been presented. In this session, Yoga itself guides the participants to practice Supervised Learning directly.

Day 3 一 Unsupervised Learning

The third day of Camp 2 Data Science Academy was filled with the material presented by Muhammad Adib Imtiyazi, Senior Data Scientist at Tiket.com. Adib brought material on Unsupervised Learning, which was a continuation of the previous material, Supervised Learning.

Before presenting the material, Adib gave a scenario where there were cases related to promotions to increase customer retention. However, in this case, there is not yet adequate customer data. Adib allowed the participants to give their opinion regarding the case. After the participants answered, Adib began to explain the difference between Supervised Learning and Unsupervised Learning. The difference between the two types of Machine Learning is that Supervised Learning requires data and its label. In contrast, Unsupervised Learning has a definition in the form of a Machine Learning method used to analyze and find out information from data that does not have a label. Therefore, Unsupervised Learning is suitable to solve these cases. Adib said that even though the data used in Unsupervised Learning does not have a label, it does not mean its use has no purpose. We must know what goals we want to achieve from the implementation of Unsupervised Learning.

Next, Adib explained the three scopes of Unsupervised Learning, namely Clustering, Pattern Search, and Generalization. Pattern Search is an Unsupervised Learning scope used to find patterns from the sequence of events, for example, providing product recommendations that users will buy after purchasing the relevant product. Furthermore, there is another scope of Unsupervised Learning, namely Generalization, which is used when the column in the data is very large or commonly referred to as high dimensional data. The last scope of Unsupervised Learning is Clustering. Clustering is a technique used to separate data based on similar characteristics. Clustering is divided into two types, namely Distance-Based and Probability-Based. According to Adib, we should not focus on clusters that cannot describe our data correctly. Therefore, the Clustering technique can help in understanding the data we have better.

After that, Adib mentioned the implementation of Unsupervised Learning on Tiket.com. Some of them are on Flight SRP Tiket.com and personas clusters which will be useful as a reference in pricing and promotion. Lastly, Adib explained one example of Clustering, namely K-Means. After the material was presented, Adib continued practicing it in the Hands-On session and closed with a Q&A session.

Day 4 一 Intro to Deep Learning

Unlike the previous few days, the agenda for the fourth day of Data Science Academy Camp 2 was filled with case study sessions first. The study case was brought directly by Vincent Tatan, a Senior Machine Learning Engineer at Google. After the committee distributed the study case groups, Vincent immediately gave all participants directions regarding the study cases. The case study that the participants on the fourth day must complete is themed classification with the title Who’s Quitting Today?. Participants were asked to classify the employee’s desire to leave a company or not.

After the participants completed the case study, Vincent explained the answers to the cases given. Vincent says the first thing to do before using a data set is to understand the data thoroughly. After that, we have to look at the characteristics of the data we have. After we understand and see the characteristics of the data, then we can process the data using the appropriate machine learning tools.

The next session was filled with a material presentation by Vincent with the theme Intro to Deep Learning. In this session, Vincent explained in more detail about Machine Learning on image recognition, CNN (Convolutional Neural Network) principles, and CNN stacks. “If done manually, image recognition is difficult,” said Vincent. “Therefore, we need a way to determine the features scalably,” he continued. This difficulty can be overcome by using the principles of CNN, namely convolution, ReLU, and Max Pooling. Convolution is a principle for manipulating features in an image which will later be distinguished more clearly which can be stated as features and which are not by using ReLU, while Max Pooling is used to see which are important and which are not so important from a feature.

Day 5 一 Hyperparameter Tuning

The fifth day was the last day of a series of Data Science Academy Camp 2 activities. The activity began with the presentation of material from Louis Owen, as the AI Research Engineer at Bukalapak. The material presented by Louis was themed Hyperparameter Tuning. Before starting the material session, Louis gave motivation regarding the importance of learning Hyperparameter Tuning. Louis said that there are two options to improve the performance of the Machine Learning model that has been built. The two options are the model-centric approach, which means managing the model that has been built, or the data-centric approach, which means reviewing the data you have. Hyperparameter Tuning is a model-centric approach that can improve model performance without changing the machine learning model that has been built.

The material begins with Louis’ explanation of the definition of Hyperparameter. The parameter is an internal variable of a model, which means that existing data can estimate the variable. Meanwhile, Hyperparameter has the meaning of an external variable from the model. It cannot be estimated from the existing data. A common example of a parameter is the coefficient on linear regression, while an example of a hyperparameter is the maximum gap from a decision tree. Louis explained the main goal of Hyperparameter Tuning is to achieve optimal model performance.

According to Louis’ explanation, there are several methods for performing Hyperparameter Tuning, namely Grid Search, Random Search, Coarse to Fine Search; which combines Grid Search with Random Search, Bayesian Search, Genetic Algorithm, and Manual Search. Each of these methods has its advantages and disadvantages, so we must know the exact time and conditions for each method used.

The next activity is filled with Hands-On. The Hands-On session this time was slightly different from the Hands-On session on the previous days. The participants were asked to discuss with their team to solve the case given by Louis. This case requires participants to improve the performance of the provided model as optimally as possible. The team that improved the highest model’s performance would get a special prize from Louis.

After the Hands-On session was over, three selected teams presented the results of the case completion. Finally, the Hands-On session at this meeting was won by the CascadePEOW team, announced directly by Louis.

After the Camp 2 series was completed, we had the opportunity to interview Louis Owen as the AI Research Engineer at Bukalapak. According to Louis, Compfest must continue to exist, especially the Data Science Academy, to add new data scientists in the future. “Portfolios are very important, so increase your data science portfolio. Whether it’s on GitHub or having a personal website,” said Louis for data science enthusiasts. “It will be very important to include it in your CV. Even during the interview, there will be a lot of discussion about that.” He continued. According to Louis, instead of just collecting theory certificates, it will be more impactful if we put it into practice directly through data science projects directly in the portfolio.

We also had the opportunity to interview one of the participants of the Data Science Academy, namely Bram, who is a member of the Brifko team. According to Bram, the Data Science Academy Camp 2 activity was memorable, especially the case study session on the fourth day. Bram is happy because the case study is a place to get to know new people. “The activities at the Data Science Academy are very interactive between the participants and the presenters,” Bram said. Bram hopes there can be more Data Science Academy COMPFEST participants because Bram believes many people have interested in data science but don’t have a place to study it.

There are still a lot of fun events at Compfest! So stay tuned for information about COMPFEST through our Twitter social media account @COMPFEST, our Instagram @COMPFEST, and our site compfest.id (Editorial Marketing/Amira).