Accelerating the Adoption of Big Data for Development

Massive amounts of information from digital trails are today being used to improve product design and service delivery for business development. The public sector though has been a little slower in harnessing similar data sets in part due to uncertainty about how big data can help to answer critical questions related to public policies.

Pulse Lab Jakarta had the honour of presenting a paper authored by our data science team on how big data can be used for development indicators and public policies at two events: the conference on Population and Social Policy in a Disrupted World organised by Universitas Gadjah Mada and the Inter-Ministerial Conference on South-South and Triangular Cooperation hosted by the Government of Indonesia’s National Family Planning Coordinating Board.

The paper, which the team hopes to publish in the coming months, goes more in depth, but this blog highlights a pair of key points on the challenges of accessing new data sources and the importance of accelerating the adoption of big data for development and social research.

Within the public sector, challenges exist regarding government capacity and even for countries making progress there still remain several missing links needed to cultivate and sustain an enabling environment for big data adoption and use.

Much of big data ownership is not within the public sector; instead the data rests with players in the private sector due to citizens’ interactions with their digital services. In Indonesia, while there are currently no comprehensive provisions with specific guidelines for data privacy, some private sector players are taking steps to establish clear terms and agreement regarding data privacy and research partnerships.

It is important for social researchers to be more than legally compliant, ethical research with these data sets must be at the heart of data science for social good. This is why at Pulse Lab Jakarta we collaborate closely with academics to access rigorous ethical reviews, have a risk assessment tool in place that is used before starting new research projects, and centre our research methods on the protection of privacy.

An effective data partnership requires infrastructure, knowledge, capacity and trust from all parties involved, in addition to a regulatory framework that allows for different research. Along the way, we’ve learned with our research partners that working towards these goals and trying to reach an agreement in a way that is beneficial for everyone involved takes time. Citizen generated data on the other hand comes in different forms such as crowdsourcing, and underlines the value of public participation in research, from data collection and data analysis, to prototyping and implementation.

Big data representativeness, coverage errors and biases are very important for researchers to consider. If we look around us at common sources of big data, for instance from mobile communication and e-commerce, it is clear not every cohort of society is represented in the data. Data analysts in the public sector need to be able to determine which cohorts within a population the data set they aim to analyse represents. Methods to calibrate insights to improve their representativeness are improving with each passing month.

While social media is an obvious source of insights, the data may contain bias towards more extroverted personality traits. Furthermore, social media metadata has several components to it, therefore it depends what aspect is of interest in a research analysis. If a team of researchers within the public sector is interested in geolocated social media posts (as opposed to the content of each post) for example, and wishes to narrow the research scope to groups within urban areas, they may be able to uncover insights to help policy makers understand the benefits of the data, for example to infer commuting statistics in the Greater Jakarta area.

At Pulse Lab Jakarta, we organised a crowdsourcing initiative called TranslatorGator, which aimed at gathering translations of development and disaster-related keywords to build taxonomies that can help authorities to analyse and understand citizens’ conversations online. Our experience leading this project demonstrates the need for a good understanding of emerging sources of data, new technological tools, data analytics and crowdsourcing itself within the public sector.

All things considered, big data should not be seen as a silver bullet that can right away fix all the problems that our society faces. But with the proper tools and methods to compensate for the coverage errors, and with more willingness to engage in public-private partnerships and multidisciplinary research, big data can provide critical policy insights to monitor perceptions of citizens in order to meet their needs; to keep track of food price dynamics; to design and evaluate government programmes; to fill gaps between official statistics; to understand a community’s response during disasters; to design effective public infrastructure; and in many other areas.

PLJ is excited by this potential, and is always looking for new data and research partnerships to improve public policy and humanitarian action through big data and artificial intelligence. Please get in touch with us if this is of interest.


Pulse Lab Jakarta is grateful for the generous support from the Government of Australia