PLJ Research Dive 7: Artificial Intelligence and Machine Learning for Estimating Poverty
The Government of Indonesia has made significant progress in reducing poverty over the past few years, recording its lowest poverty rate of ten per cent in 2017 measured by income. Many citizens still remain vulnerable given their marginal position above the national poverty line. How governments go about estimating poverty, in order to better target programmes, has never been an easy task.
Pulse Lab Jakarta’s research dives bring together academics, public officials and other researchers to dive into big data sets to develop new methods and insights on burning policy questions. The impact of previous events include increased collaboration between institutions, increased publications by academics in the theme of the event, and the uptake of new data and methods to address policy questions within the Government of Indonesia. With such results, our expectations for our most recent research dive were no less ambitious.
Thanks to sponsorship and support from the Knowledge Sector Initiative (KSI) and the Artificial Intelligence Journal, our most recent research dive brought together a full house of researchers and data analysts from academia, development partners, the United Nations and the Government of Indonesia to investigate how artificial intelligence and machine learning approaches combined with conventional and unconventional data sets can help estimate poverty levels.
One of the Research Dive’s underlying goals is to support the Government’s development agenda; in particular, efforts geared towards achieving Sustainable Development Goal number one on zero poverty.
Part of the role of Pulse Lab Jakarta in catalysing the data ecosystem is to introduce new data, tools and methods so that our partners and stakeholders can replicate the research and explore the findings further. As part of our preparations, Pulse Lab Jakarta consulted with the Directorate of Poverty Reduction and Social Welfare in the Ministry of National Development Planning (Bappenas) to identify priority issues, potential actors and domain experts on the topic.
Together with nine academics, one representative from the World Food Programme, two personnel from the Central Statistics Agency, one member from the National Team for the Acceleration of Poverty Alleviation, two staff from the Ministry of National Development Planning and five research and data analysts from PLJ, this Research Dive ran for three days and culminated with four closing presentations.
Bappenas kindly hosted the closing presentations, which was attended by Pak Suharmen ( the head of Bappenas’ Data and Information Centre), Dr. Gellwynn Daniel Hamzah Jusuf (Head Secretary of Bappenas), Vivi Yulaswati (the Director of Poverty Reduction and Social Welfare at Bappenas) and Ibu Vivi Alatas (the lead economist for the World Bank’s poverty programme in Indonesia).
Here we highlight the main findings from each presentation:
Team 1: Measuring Vulnerability to Poverty Using Satellite Imagery
Analysing nighttime light imagery from satellites as a proxy of poverty, this team found that villages with more than 50 per cent of their population poor are likely to have lower luminosity levels. In addition, the team looked at other geo-spatial data sets to assess whether vulnerability to poverty can be predicted based on citizens’ accessibility to rivers, roads and public facilities, as well as by examining different types of land use.
Data: (i) satellite data (nighttime lights, population density and settlement footprint, land cover, OpenStreetMap data and digital elevation model representations), and (ii) baseline data (2010 population census/ Badan Pusat Statistik, 2011 Potensi Desa/ Badan Pusat Statistik, 2015 Tim Nasional Percepatan Penanggulangan Kemiskinan (TNP2K) data and Batas Wilayah/Badan Pusat Statistik).
Team 2: Estimating City-level Poverty Rates Based on E-commerce Data
The second team tried to estimate city-level poverty rates in Java island based on e-commerce data. They found that by employing machine learning tools, for instance Support Vector Regression (SVR) and Artificial Neural Network (ANN), with significantly low error margins and high level of accuracy each city’s poverty level can be predicted.
Data: (i) 2016 e-commerce data from OLX which was aggregated by city, and (ii) poverty rate data from the Central Bureau of Statistics (Badan Pusat Statistik), in particular the percentage of people living below the poverty line at the city level.
Team 3: Using Twitter Data to Estimate District-Level Poverty in Greater Jakarta
Relying on natural language processing to conduct content analysis, the team extracted public tweets that contain food-related and poverty-sensitive keywords such as “harga naik” (price increase) and “miskin” (poor). The analysis produced results with relatively high levels of accuracy for predicting poor and non-poor districts.
Data: Anonymised 2014 Twitter data for Jabodetabek area. The variables include:
- ID: province name, date time, district code
- User ID: province code, timestamp, district name
- Latitude: content and source
- Longitude: gender and location
Team 4: Exploring the Connection Between Social Media Activities and Poverty
The final group set out to explore the relationship between social media activities and poverty (based on survey and census data at the village and individual level for the Greater Jakarta area). The preliminary findings suggested that the number of Twitter users with the geo-tagging feature enabled can be used to infer the poverty headcount index. Also of interest, the number of Twitter users in an area positively correlated with higher inequality levels.
Data: (i) SMERU Poverty Map (poverty headcount, Gini Ratio) at the individual and household data level, as well as the village sub district and district levels, (ii) 2014 Pekan Olahraga antar Desa (PORDES), and (iii) Twitter metadata (number of user IDs, total tweets, and average number of tweets, mentions, hashtags, links, and average number of locations).
A big thank you to the advisors who were kind enough to share their knowledge as domain experts with the teams and offer recommendations on how to refine their research approaches and analysis throughout the three days.
We were joined by Prof. Arief Anshory Yusuf from the Faculty of Business and Economics at Universitas Padjajaran (whose focus was on the various degrees of inclusivity as it relates to monetary and non-monetary poverty in Indonesia); Prof. Dedi Rosadi from the Department of Statistics at Universitas Gajah Mada (who explained the benefits of the programming software R and its application for statistical modeling of poverty); and Faizal Thamrin from DM Innovation (who described the benefits of using poverty data to aid vulnerable population during natural disasters).
The lively discussions that followed each team’s presentation raised several points related to data accessibility and reliability, data privacy and ethics and the Government’s need to leverage these unconventional data sets with improved tech architecture and data skills. What stood out the most was the notion that despite the many benefits that machine learning and artificial intelligence offer (such as speed, reduced cost and accuracy), there remains a need for conventional data sets to train the models and ‘ground-truth’ the findings.
Each team will be producing a more detailed technical paper, which we will share soon. And we look forward to welcoming the next team of Research Dive participants at the Lab in November.
Once again, a huge thank you to all the participants and the advisors, as well as the Knowledge Sector Initiative (KSI) and the Artificial Intelligence Journal for the sponsorship.
Pulse Lab Jakarta is grateful for the generous support of the Government of Australia.