About the Budding Data Scientists Hackathon

Hui Xiang Chua
Budding Data Scientists
3 min readMar 9, 2018

22 students. 7 teams. 1 pilot data science hackathon for high school students in Singapore. Made possible with the support of the KDD Impact Program.

Unlike other hackathons that typically last a few days, this hackathon last for months.

Timeline for the “Budding Data Scientists Hackathon”

The “Budding Data Scientists Hackathon” hopes to achieve the following objectives:

  • Enhance data science community engagement;
  • Expand outreach of data science;
  • Increase diversity and participation in data science;
  • Increase societal impact of data science;
  • Influence public policy through data science.

The hackathon aims to motivate upper secondary school students (i.e. grade 9/ 10 of the U.S. high school system) to develop an interest in data science and use data science to help a social cause. They will work in teams to help tackle social challenges (of their interest) using data science, with a possibility of improving the data maturity within Voluntary Welfare Organisations (i.e. non-profit organisation that provides welfare services and/or services that benefit the community at large). All teams will have to present their findings at a final showdown to a panel of judges and prize monies will be awarded to the top three teams.

As the current secondary school curriculum does not encompass data science, all students participating in the hackathon will have to undergo five sessions of training (relating to the different aspects of data science such as statistics, programming, data maturity framework, data pipelines etc — see “Training week” below).

This will also be the first time students are able to gain real-world experience working with data science problems at earlier stages of their education in Singapore. The inaugural “Budding Data Scientists Hackathon” brought together five teams of students from Hwa Chong Institution, and two teams from the affiliated Nanyang Girls’ High School, with 3–4 students per team.

The final showdown will be open to teachers and non-participating students to raise awareness of data science and its applications. The various data science projects done during the hackathon can become use cases while the “Budding Data Scientists Hackathon” can be replicated across different high schools when successfully implemented.

This blog documents the learning and project outcomes of the students.

Training week

Day 1
Lab 1: Software installation [notes]
Theory 1: Introduction to various Data Science tasks [notes]
Theory 2: Basic statistical concepts [notes]
Lab 2: Basic statistical tests in R [notes]
Homework #1: Share 3 things I learnt today and 1 question I still have.

Day 2
Lab 3: Introduction to R (Basic R functions, Indexing, Sort) [notes]
Lab 4: Data preparation in R (Merging, Recoding, Web Scraping) [notes]
Lab 5: Plotting and Advance functions in R (IF and FOR) [notes]
Homework #2: Using what you have learnt today, find interesting table(s) on Wikipedia, then use R to extract and plot something. Be sure to include plot title and axis labels.

Day 3
Theory 3: Probability [notes]
Theory 4: k-Nearest Neighbors [notes]
Theory 5: Regression [notes]
Lab 6: k-Nearest Neighbors [notes]
Lab 7: Regression (Simple, Multiple) [notes]
Lab 8: Decision trees [notes]
Homework #3: Share 3 things I learnt today and 1 question I still have.

Day 4
Projects announcement
Lab 9: Data visualization and dashboarding in Tableau [notes]
Homework #4: Build a dashboard containing three charts (at least two different chart types) and post to Tableau Public. Save your dashboard as image and create a Medium post inserting the image and your Tableau public dashboard URL.

Day 5
Lab 10: Webscraping with Python [notes]
Homework #5: Go to IMDb.com> Movies, TV & Showtimes> Most Popular Movies. Scrape Title, Rank, Rating, Advisory category, Run time, Genres, Date. Do a write-up on what kind of analysis can be done including a screenshot of your code.

This post is contributed by the principal investigator, Hui Xiang Chua, behind the “Budding Data Scientists Hackathon”. For more information regarding the hackathon, reach us at buddingdatascientists@gmail.com.

--

--