Big data for understanding mass migration trends
Samuel Heroy, Isabella Loaiza Saa, and Alex Pentland
From Colombia to Turkey and beyond, the implications of recent large-scale migrations in public policy are far and wide, ranging from community planning to climate change measures. Recent mobile phone-based technologies provide a novel platform for researchers to study these migrations as they happen.
An explosion of research in this area in recent years demonstrates that mobile phone-extractable data sets can be used to accurately quantify migrations in real time and at levels of aggregation that were only previously possible years after they happened.
Bringing the tools of big data to policy and social impact requires coordination between researchers and a range of stakeholders. Recent collaborations spanning private industry, social entrepreneurship, and policy-making institutions demonstrate that leveraging these insights can have important and far-reaching benefits for migrants and their host countries alike.
Large-scale migrations have been integral to societal changes of the last decade, and will continue to further shape our societies in the decades to come. In 2017, there were approximately 257.7 million international migrants — roughly the size of the 6th most populous country in the world (Brazil). By the end of 2018, the number of displaced persons reached a record high of 70.8 million1 — larger than the population of the UK.
These mass migrations pose political, economic, and social challenges to both migrants and host communities. Interestingly, 21st Century migrations differ from previous ones in several crucial aspects: they occur with unprecedented speed, and migrants have access to technology to aid them in their journey — like smartphones and community-built maps (e.g. OpenMaps). The use of these technologies is helpful for migrants as well as researchers on account of the digital traces they leave.
Digital traces provide both geo-location and social information about migrants, allowing for a more nuanced and complete understanding of mass migration events as they happen. For instance, social media data has been used to quantify in real-time the number of Venezuelan migrants and their socio-economic distribution at the national, sub-national, and city levels.¹ Moreover, research at the MIT Media Lab has examined the integration of Syrian refugees in Turkey by studying their social networks and mobility patterns as realized by call distributions. Related work concerning social integration of Venezuelan migrants in Colombia is underway.
One of the great concerns of using big data is in protecting the privacy of the refugees, and protecting them from potential dangers such as unethical government actors and criminal organizations. Consequently, analysis efforts should be restricted to use of data that does not allow identification of individuals. In practice this means use of anonymous data aggregated at different levels, either temporally or spatially, and screening to protect outliers using methods like differential privacy, k-anonymity, etc.
In conjunction with the availability of big data, new tools and frameworks for analysis are being developed to handle this highly resolved, highly detailed information and help us to better understand the effects of large-scale migration events on society. In order for these research efforts to translate into action, communication between academic researchers and policymakers as well as the private/non-profit sectors is vital.
Left : Raw estimates (using Facebook data) of the numbers of Venezuelan migrants in regions of Brazil (upper) as well as in 1 km tracts of Boa Vista, Brazil (reproduced with permission from Palotti et al. 2019). Right: New sources of data, such as passive smartphone data, provide enhanced opportunities to reveal motivations, sentiments, and possibly even impacts of migration.
Big data for “backing up claims with the numbers” and informing decisions
In a recent panel discussion in Cartagena*, preeminent mobility scholar Professor Marta Gonzalez (University of California) highlighted the need to confront beliefs about migrations with quantitative accounts of the dynamics of migration events. She compares concerns about these large-scale movements to concerns about pandemics. As with epidemics, accurate, real-time, and trustworthy information is essential not only to quell potential migrant apprehension, but also to design effective policies in response to plausible pressures. Consequently, the nascent research field of using new data sources to study migration is in high demand.
Conventional estimates of migration often rely on census or survey data, which have the advantage of being reasonably reliable and trustworthy sources of data as limitations (and mitigation strategies for these limitations) for these types of sources are well-known. However, governments typically administer censuses every 5–10 years on account of the costs and labour they require (especially in developing countries), making real-time estimations related to migration highly difficult and often subject to political speculations.
Reliance on census/survey data alone is therefore often insufficient for understanding current fast-paced migrations as they happen, let alone making policy-related decisions, explains Dr. Marisol Rodriguez Chartruc, economics specialist at the inter-American Development Bank (IDB). Big data is useful “not just because it sounds cool” but because it captures movements of people at a spatio-temporal resolution conventional approaches cannot. Development banks like the IDB, as well as numerous other institutions, are consequently enthusiastic about using CDRs (call detail records, which record times and area-level locations of users’ phone calls) and other big data sources for research through its migration initiative as well as for designing loans/initiatives.
Information solutions powered by technology
“At the beginning of a crisis, we usually see state and NGO actors at the scene taking action,” notes social entrepreneur Berat Kjamili. However, entrepreneurs can complement these actions by providing sustainable solutions to problems that refugees face. An example of this is the award-winning smartphone app Migport, which Kjamili designed for Syrian refugees in Turkey to communicate with volunteers who help provide solutions concerning the education, financial, and bureaucratic barriers they face.
Moreover, Kjamili asserts that entrepreneurs and academics need one another. For example, researchers support entrepreneurs via technical expertise and creativity, while researchers rely on entrepreneurs to provide data and realistic perspective on important social problems. Examples of these collaborations include the Data4Refugees challenge, which recently allowed groups of academic researchers the chance to work with high-quality data sets and interact with refugees as well as government ministries with the aim of better living conditions for refugees in Turkey.
Dr Gonzalez claims “the dream” of data for social good is to design systems that not only collect data in real-time but also provide feedback to users. For example, satellite navigational systems are able to propose new routes to users in real-time when traffic incidents prohibit the usual route. Responsive systems of migration-related technologies (like Migport) can analogously help direct users to valuable information regarding housing, education, etc.
The future landscape of using big data for migration
One challenge that faces any research using big data is in access — attaining the means to study such data is complex on account of ownership and privacy concerns. “Companies have monopoly power over big data,” explains Dr. Rodriguez Chatruc. For example, telecom companies have control over users’ CDRs as well as XDRs (information regarding app usage) and passive data (constant data on users’ GPS locations). Growing public concerns regarding privacy scandals, in addition to new developments in privacy legislation (e.g. the GDPR in Europe) have further complicated the ability of researchers to interact with big data, as any missteps by researchers will inevitably be linked to their data providers.
While defining an equitable and secure framework is subject to several difficulties, there is hope that stakeholders from various arenas can together work to develop an ecosystem that encourages the use of big data for social good. Successful collaborations include the Data4Refugees collaboration between academic researchers, Migport and Türk Telekom, as well as several organizations that aim to turn the insights of big data into public goods while simultaneously heeding privacy concerns. For instance, OPAL (Open Algorithms for Better Decisions) looks to bring researchers access to the data without the need for them to actually have it on their hard drives. That is, OPAL provides a “trusted enabler” platform wherein researchers can input queries on granular big data and receive aggregate data sets in return, circumventing privacy issues connected with total access.
The next step, according to Rodriguez Chartruc, is that governments need to create systematic legal frameworks for access to big data, as use is ultimately limited by private ownership The UN-affiliated International Office for Migration (IOM) echoes these concerns, calling for the “establishment of an adequate regulatory and legislative framework for the collection, analysis, and sharing of big data”. As research in this area develops alongside technology, stakeholders from the public, private, and research sectors are hopeful that policy design and social entrepreneurship will further embrace the incredible information potential that big data offers.
Note: This article is published under a CC BY license.
Dr. Samuel Heroy is a postdoctoral research associate at the Mathematical Institute, University of Oxford.
Isabella Loaiza Saa is a PhD student within the Human Dynamics group at the MIT Media Lab.
Alex Pentland is the Toshiba Professor of Media Arts and Sciences at the MIT Media Lab, where he directs the Human Dynamics Lab and the MIT Connection Science.
¹ Palotti, J, N Adler, AJ Morales, J Villaveces, V Sekara, M Garcia Herranz, M Al-Asad, & I Weber. Real-time monitoring of the Venezuelan Exodus through Facebook’s advertising platform. Technical report. March 2019.
*The quotes from this article come from a panel discussion regarding computational study of mass migrations that took place at the Latin American Conference on Complex Networks in Cartagena, Colombia (August 2019). The discussion — -between Professor Gonzalez, Mr. Kjamili, and Dr. Rodriguez Chartruc — -can be viewed in entirety at https://twitter.com/SamuelHeroy/status/1159196884097011716