The UN Datathon: a dynamic challenge.

Melbourne Centre for Data Science
KERNEL-MCDS
Published in
4 min readJun 6, 2024
Photo by Andrew Stutesman on Unsplash

The United Nations (UN) Datathon (6–9 November 2023) was a dynamic, three-day challenge to devise solutions to real-world problems using data science. The theme was to ‘accelerate progress towards the United Nations Sustainable Development Goals (SDGs)’. The SDGs are a series of 17 goals adopted by the United Nations in 2015, with the aims of ‘peace and prosperity for the people of the planet’. These goals are broad and include no poverty (SDG 1), gender equality (SDG 5), climate action (SDG13) and life on land (SDG 15). Solving all these goals is a momentous task, and any help that can be needed is worthwhile — setting the scene for a strong theme for a DataThon.

There were over 1600 participants from all around the world who competed in this year’s Datathon, forming more than 400 teams. Our group of four PhD students represented a diverse cohort of data scientists all working in different fields. I am in the field of climate science, Rena is exploring machine learning applications in natural language processing, Phuong in Actuarial Science, and Sunita in Health Informatics. Together, we were all able to use our distinct skills and experiences to tackle this huge problem.

For this Datathon, any data that is publicly accessible could be used, and the project should aim to accelerate one or more of the SDGs. Coming up with a project with almost unlimited scope was quite challenging. We eventually found a project, with enough data, that we were also all passionate about. Our project focused on The Arab World as a large group of countries located in Western Asia and Northern Africa and was titled: ‘What factors lead to higher primary completion rate of women in the Arab world?’

This aligned with two SDGs — Quality Education (4) and Gender Equality (5). Our project also related to many more of the goals indirectly, as increased education for women is one of the most effective ways in fighting climate change. It has been found that increasing the education rate of women, leads to better land management practices, as in many places women are responsible for agricultural activities. Educating women can help make these practices more sustainable by teaching techniques such as agroforestry and crop rotation. Increasing the education rate in women also leads to slower population growth, as there are fewer unplanned pregnancies and unsafe abortions and fewer child morbidity and mortality rates. Additionally, higher education rates amongst women encourages sustainable economic development.

The percentage of females attending school has been increasing in the Arab world in many regions, however, the education rate is still lower — 86.3% compared to the global average of 91.6%. Many of these countries also have lower primary completion rate for females than males.

We set out with two goals:

1. Identify the most relevant factors contributing to a greater primary completion rate in the Arab world, with a focus on the gender gap.

2. Based on the key factors, recommend practical solutions to improve the primary completion rate and close the gender gap.

The dataset we used was from the World Bank Group. The dataset contains data from the last 21 years (2000–2020), with more than 80 features: these included access to electricity, literary rates, and gender ratios in different fields. We explored which features correlated with the primary completion rate in women. Due to the nature of the dataset, we decided to try linear regression (with regularisation) to reduce the multicollinearity effect, and random forest and gradient boosting regression to see whether more advanced techniques could add more value in this case. Before modelling, we sectioned the dataset into training and testing sets using an 80:20 ratio. Cross-validation is also utilised to find the best hyperparameters where relevant. As expected, the Ridge regression (linear regression with regularisation) beats more advanced methods in predicting the primary completion rate.

Applying Ridge regression to predict the gender gap between primary completion rates, we can extract the most influential factors for both models, then compare and contrast to find the common factors that drive both improvements in primary completion rates. The two factors that we found that contributed most to the primary completion rate were female unemployment and access to electricity.

We do note, however, that these results are obtained using a small dataset. We also acknowledge that aggregating over the Arab World, which is a large region with spatially diverse people, may overlook many details that are important in each region.

However, the results we obtained in these three days are still of interest. Further research could explore additional factors and apply this analysis spatially. Additionally, it could help assess if regions of low female completion rates also coincide with areas with high female unemployment and low internet connection. If this relation holds, then this could provide policy and decision-makers with the knowledge of which regions are most in need, and what money can be best spent, to increase the primary female completion rate. Additionally, to understand if these relationships are meaningful, more work needs to be done to determine why these relationships may exist in the first place.

In the 2023 UN Datathon, we focused principally on what factors related to female primary completion rate in the Arab World. Using the machine learning methods of random forecast, ridge regression and gradient boosting, we explored the relationship of female primary completion rate with over 80 different factors. We found these factors to be a lack of access to the internet and high female unemployment.

Attending the UN Datathon was a fascinating experience that allowed me to collaborate with people and data that I usually don’t have the chance to work with. It was also exciting to be part of such a large international data science initiative. Don’t hesitate to attend the 2024 Datathon!

— — Alexander Borowiak

--

--

Melbourne Centre for Data Science
KERNEL-MCDS

Where stories about data science are written by our Researchers, Associates, Investigators and Ph.D. students. Visit us at: https://science.unimelb.edu.au/mcds