Student Reflections: Education Hackathon

My first Data Science hackathon and writing a blog post

Published in

Eskwelabs

5 min readMay 30, 2019

This is Part 1 of a series I am writing to showcase what I am learning in the Eskwelabs Manila Bootcamp, which develops data science talent. How Eskwelabs works: 5 weekday evenings of learning a fundamental topic and on Saturday, Fellows practice what they learned through a hackathon.

For our first week, we learned Python programming and Pandas data frames. The first hackathon we participated in was focused on education outcomes in the Philippines, which I will be sharing here!

(This is my first time writing a blog, so please be forgiving of whatever mistakes I make while you read this ✌)

With my hackathon teammate Elaiza during the hackathon. We look calm but we are freaking out.

The Hackathon Problem

In the Philippines, the most common way to attend class is by enrolling into the public school system. An advantage of public schools is that they offer free education until high school, with opportunities to employment. However, there is still a population who do not take advantage of free education. What are some factors that affect enrollment?

In this hackathon, we explored the enrollment dataset for school year 2012 and 2015 from the government’s open source database which had details on the barangay level We want to know the descriptive trends including the number of enrollees, the gender of enrollees.

In the second part of our analysis, we explored Zamboanga del Sur and the possible factors that affected the change in enrollment between 2012 and 2015. We considered health, housing, income, peace, and water availability in our analysis.

Our Findings

There is an overall increase in national enrollment from 2012 and 2015, this is evidenced by the 5.6% increase in the median. That’s a good thing because we can say that our education system is improving, but is it enough? We can look further into gender and year level.

Fig 1: Boxplot of the number of enrollments during 2012 and 2015

In 2012, we had the same number of male and female enrollees. In 2015, however, the female enrollees had increased more. We attributed this to the effects of the country having a larger female population of students. However if there are other factors, it might be interesting to found out why and what can it say about the gender discrimination of who are more likely to enroll in public high school.

Fig 2: Comparison of the number of enrollees by gender

The change in enrollment per grade tells us that senior year enrollees experienced the least change in the number of enrollments. We might investigate further with this, but we hypothesized here that as the school years go by, enrollment might decrease due to students dropping-out, thus introducing another concept in our education system:out-of-school youths.

Fig 3: Comparison of the number of enrollees by year level

Looking Further into Zamboanga Del Sur

We also had an additional dataset to look more into what factors can possibly affect the number of enrollments in different municipalities of Zamboanga Del Sur.

Our first insight is, there are 3 major factors — out of five we considered — that may contribute to the number of enrollees: Income livelihood, Health and nutrition, and Housing.

Our second insight is particularly interesting, we have found out that there is an unusual number of enrollees with respect to the 3 factors mentioned above in the municipalities of Kumalarang and Aurora.

Fig 4: Aurora and Kumalarang Municipalities (from Wikipedia)

Our assumption going in was: If the magnitude of poverty/malnourishment/squatters increases then the number of enrollees would also decrease. But for the municipalities of Aurora and Kumalarang, even if these municipalities had high magnitude of those 3 factors, these municipalities still had a high increase in the number of enrollees. Meaning that the strengths of those relationships were low.

Our hypothesis is that these 2 municipalities are outliers because the proportionality of the student population to the overall population was much larger or that other unequal distribution of resources i.e., more schools were in these two locations.

How Did We Do It?

There are 3 steps we executed to do this level of analysis:

Data Preparation: We cleaned our initial dataset by dropping rows with incomplete data, removing commas(,) in numerical values and cast the appropriate data types per column.
Data Exploration: We grouped the dataset per age and year level. We actually looked into the latitude and longitude but there was no significant relationship. Finally, we grouped the dataset by municipalities so it can be used for the final step
Data Interpretation: We merged another dataset with household data (underlying factors we were considering) that was from the municipalities of Zamboanga Del Sur. Then we extracted the relevant columns by creating dummy variables via One-Hot Encoding. Finally, we compared the magnitude — which is the number of households — to the number of enrollees.

If you want to see the notebook I used, you can download my Jupyter Notebook here.

Needless to say, this first hackathon started a bit slower as we are still learning how to use Python. Last but not the least, here are a few lessons I learned during my first hackathon 😁😊

Lessons Learned

This was my first time doing a hackathon, and it was really nerve-wracking. There are two main lessons I received from this hackathon: First, focus on the problem. As Einstein said, “If I had an hour to solve a problem, spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.”

I was caught up with the technical stuff I needed to do, i.e. cleaning the data, making sure the code was correct, and searching for the right technical terms. Second, I learned that everyone loves a good story. In terms of presentation, I had probably talked about things without much context, but of course, I edited in those parts in this blog. This left me with the impression that our code is useless if no one can understand the significance of the results we discovered.

That’s it for Week 1, come back next week for our Data Visualization Hackathon.