Using Big Data for Statistics to Track the SDGs

(Image by: Ybs, licensed under Creative Commons.)

Measuring the many aspects of development is often fraught with challenges. The Philippine Statistics Authority which collects data on different sectors in the Philippines has been advancing new methods to measure progress, particularly against the Sustainable Development Goals.

Our data scientist was recently in Manila attending the 2017 International Conference on Sustainable Development Goals Statistics, hosted by the Philippine Statistics Authority and the United Nations Statistics Division. For his presentation, he put together a neat set of slides that captures some of Global Pulse’s new statistical methods for tracking progress towards the SDGs.

Ahead of the 4th UN Conference on Big Data which will take place in Colombia next month, we thought we should share the slides and unpack some of the projects.

Early detection of food price anomalies is critical for timely action by governments. Partnering with Bappenas (the Indonesian Ministry of Development Planning) and the World Food Programme, Pulse Lab Jakarta developed a statistical model to extract prices for four food commodities (beef, chicken, onion and chilli) in Indonesia from public discussions on Twitter.

When the modeled prices were compared with the official food prices from the Indonesian Government, released later, the figures were closely correlated. These findings demonstrated that near real-time social media signals can function as a proxy for daily food price statistics, and an early warning mechanism for price fluctuations. This approach is particularly useful to citizens and other stakeholders alike given that the official figures tend to be released to the public, and shared across government with some time lag.

Building on past studies which show that data from mobile phones (in particular from call details records and airtime credit purchases) can be used to understand socio-economic conditions, we conducted research into the potential of using mobile phone data to produce a set of proxies for education and household characteristics.

Using anonymised mobile data from the Pacific island of Vanuatu, proxies for four types of statistical indicators were extracted from mobile phone data that was made available by a local carrier. These indicators included education, household assets, household expenditure, and household income. The findings confirmed a relatively strong correlation between indicators from the mobile data and data from the official statistics provided by the National Statistics Office in Vanuatu.

Gender inequality is manifested in many aspects of life. We partnered with the UN Capital Development Fund (UNCDF)’s SHIFT Programme to analyse anonymised financial records from four financial service providers in Cambodia in order to investigate the factors affecting savings and loans mobilisation, with a focus on gender disaggregation.

Our data analysis suggests that despite the fact that men and women have equal access to credit and savings services in the region, the actual usage patterns in terms of the loans and savings amounts mobilised are much lower for most women. The data also enables more powerful insights, not just breaking down data simply gender, but doing further gender disaggregated data analysis by age categories, marital status and geographies (rural, urban, and between provinces).

Gaining an up-to-date snapshot of development progress within countries is useful for tracking the effectiveness of national policies. Pulse Lab Jakarta teamed up with Carlos III University of Madrid and UNICEF to explore the potential of utilising social media to produce a proxy indicator of the human development index. For this research, we analysed tweets coming out of the islands of Java and Sumatra in Indonesia, and employed similar methods used for inferring unemployment rates in Spain.

The results from this research revealed that the predicted human development index fitted closely with the actual human development index, thus pointing to the precision of the model used. This encouraged us to expand the approach to other countries including Brazil and Mexico, both of which showed relatively consistent results.

Jakarta is renowned for its heavy traffic which makes daily commutes a lengthy task, and tracking the commutes of Greater Jakarta’s 30 million residents is an even more mammoth task. To tackle this data problem, we decided to analyse GPS-stamped tweets.

Along with Sekolah Tinggi Ilmu Statistik (STIS), we initiated a project to test whether location information from social media on mobile devices could reveal commuting patterns. The project calibrated the initial result based on the population distribution and Twitter user distribution. Next, we verified the results with the official commuting statistics produced by the Indonesian Bureau of Statistics. The results of the research confirmed that geo-located tweets have the potential to fill existing information gaps in the official commuting statistics.

Commuting statistics — 10 cities in Greater Jakarta to 5 cities in Jakarta

Our team has been analysing satellite imagery to map locations with climate and rainfall anomalies and to provide early warning alerts to policymakers. Our VAMPIRE (Vulnerability Analysis Monitoring Platform for Impact of Regional Events) platform provides map-based visualisations and features three main layers: a baseline data layer (population data, socio-economic and food security surveys), a climate layer (rainfall anomaly, standardised precipitation index and vegetation health index) and an impact layer (economic vulnerability and exposure to drought).

The benefit is to provide real-time awareness of the evolving nature of slow onset climate phenomena, which can then be used to better channel assistance to vulnerable populations. The working prototype is available here.

The examples above represent a snapshot of our overall work related to the SDGs. To better address the challenges that we face today as a global community, we recognise that everyone needs to do their part. For this reason, we are keen on fostering partnerships with governments, private sector and civil society entities on a range of topics linked to the Global Goals. Furthermore, as new and powerful technologies continue to emerge with the potential to offer new insights, our mission is to continue leveraging these innovative tools in conjunction with traditional data in order to inform action to achieve each of the SDGs sub-targets by 2030.


Pulse Lab Jakarta is grateful for the generous support from the Government of Australia.