I recently made the not-so-quick flight from Yangon to Istanbul to explore how relationships between private companies, researchers, and domain experts can create real change at the Data for Refugees (D4R) Challenge final workshop. Data in a digital form is a relatively new concept for Myanmar. Outside of the country, researchers have been grappling with digital information’s onslaught and thinking about how to democratize and leverage it for societal good. The D4R Challenge, a partnership between Turk Telekom, the Scientific and Technological Research Council of Turkey (TUBITAK), and Bogazici University, is a prime example of researchers and scientists working to harness the power of digital data for society’s benefit. D4R participants were asked to develop novel applications for cell phone data to improve the living conditions of more than 3.5 million Syrian refugees in Turkey. I made the trip from my office at Phandeeyar to see the challenge results first-hand because I believe we can adapt D4R innovations to the Myanmar context for the benefit of its people.
Last year, academics from Turkey, under the auspices of D4R, made a call to researchers for abstracts and ideas using Call Detail Record (CDR) datasets that provided pertinent and real information on a world issue: displacement. The response to this call brought together experts in mobile data analysis, refugee aid, digital security, data ethics, and public policy in a way that felt prepared to set an example. “Providing better living conditions” for millions of people is a herculean task, and many will definitely, and maybe fairly, doubt the ability of data researchers working to do just this. However, there is a lot to be learned by analyzing something as granular as telecommunications data, and partnerships like this one bring together experts from different disciplines to critique clever ideas and push them to be more robust.
Let’s return to CDR data. Before thinking about what the data can provide, I think it’s important (and fascinating) to learn about what this data contains. What type of questions are even feasible to investigate and where does the data leave us wanting?
CDR data, at its most basic level, stores when, where, and to whom a call was made. From this information, at the scale of an entire country over several months, more abstract concepts like migration patterns, commute distance, and even poverty can be inferred with relatively high probability. It’s powerful.
To be clear, the data used here was not the raw CDR data, but a de-identified version. Levels of “anonymization” vary, and you can bet that this is studied as well, but there are precautions that can be taken and this is a discussion in itself. For the purposes of D4R, this CDR data was also differentiated to show whether phone records were affiliated with refugees or non-refugees. Because D4R handled a vulnerable population’s data, the Challenge needed to go beyond the standard data ethics and privacy recommendations and guarantee that data ethicists play a major role in the event.
D4R shared three distinct datasets at varying levels of granularity to avoid exposing information that might reveal any specific phone user’s identity while still providing indicative information. Read more about it here.
Now what can we actually do with this data? In October of 2013, the UN Global Pulse Lab released a report looking at CDR data’s early development potential. The report states that “while at first glance it is difficult to assess the value of this rather rudimentary data, remarkably useful information on human behavior may be derived from large sets of de-identified CDRs”. The Global Pulse Lab report went on to specifically highlight cases of CDR data for analyzing mobility, social interaction, and economic activity. These are not the only areas for which CDR data can provide useful insight, but it’s a good place to start. As I saw at the D4R workshop, this data becomes even more powerful when crossed with other available datasets.
The D4R challenge awarded teams from five different focus categories: Safety and Security, Healthcare, Education, Labor, and Social Integration. Check out the website to see the winners for each category. The complete (523 page) proceedings document from the event is also available if you want to see the work in more detail.
The D4R Winners:
The very first winner was a paper that swept over many concepts that were buttressed later in the day entitled Data Analytics without Borders: Multi-Layered Insights for Syrian Refugee Crisis. The paper presented an insight regarding housing price analysis that particularly caught my attention. With the cellular data, the team already has a rough idea where some refugees live. Coupling this inference with information on a neighborhood’s mean education level means that researchers can roughly estimate relative levels of income (this tactic is based on previous studies, relying on the correlation between income and education levels). Housing prices scraped from a real estate website then continue to validate income heuristic findings, showing that the neighborhoods with the highest density of refugees also face difficult financial circumstances. Initially, this may not sound like much. A neighborhood notorious for having a high density of refugees living in less than ideal conditions doesn’t need a dataset to be identified, but if we look at this at the scale of an entire country attempting to make policy decision on how to implement housing initiatives, this information becomes a lot more valuable.
Another team from Universitat Oberta de Catalunya examined just this. They viewed the problem as an issue in optimization of integration. They looked at the communication patterns between refugees and non-refugees and identified which groups exhibited behavior that signaled that they were less ‘integrated’ than the population from another district. They then examined ways to perform “social mixing” such that districts reached a level of social heterogeneity, then using real estate data, this showed how to implement a housing voucher system (or similar program) to actually make that population shift happen, including cost estimates. I think it’s worth acknowledging that the definition of “social integrations” and the merit of “social mixing” still need much debate, and the authors of this paper do note ongoing conversations concerning these issues. The point is that with robust telecommunication data, relevant policy experts, humanitarian workers, and community members, can look at a question from a slightly different angle.
Myanmar isn’t exactly in an identical situation, but that doesn’t mean there isn’t a lot to be learned from the D4R Challenge. In Myanmar, the first order of business is establishing a high standard for data ethics. As initiatives like this one would be new for Myanmar, setting a standard that allows the public to see data as a tool with potential for good is important and that their data, and in effect, their identity and privacy are safe. As researcher, and for anyone involved with similar projects, it’s important to think about the trust of those whose data is being studied.
When applying D4R Challenge concepts to Myanmar, we don’t need to limit ourselves to CDRs and issues like refugee integration. CDR data is just an example of an extremely rich dataset that provides a new perspective from which we can infer properties of a group even though it wasn’t originally collected for that purpose. Other examples of this concept are Twitter data or even Google search data.
Myanmar’s massive mobile penetration rate of 105% for mobile broadband users offers an alluring first target, but they’re just one segment. There are examples from the U.S. where using even rudimentary data analysis techniques can help children at risk of abuse receive attention from social workers before an incident occurs: this is something that local government is already attempting, but the data can help them target their resources more efficiently. In the aid world, data partnerships can help produce risk maps to allow medical aid workers to view disease outbreaks over the entire country and plan interventions more strategically.
If we look back at D4R, there were some pretty huge names in the world social data science involved. It’s fair to posit that not every organization that can benefit from sound data analysis has their own team of data scientists, but that may not be necessary.
Phandeeyar wants to guide these initiatives for Myanmar. Since 2014, we have been with Myanmar as technology has developed, changed, and influenced the lives of the people. We run Open Development Myanmar, a site dedicated to providing open data and relevant information to researchers and the general public in Myanmar. We also want to also guide larger partnerships with change-creating potential. Myanmar is already creating and storing information at much greater rates in private, public, and non-profit sectors as a result of user created data and digital collection methods, and that data is (slowly) becoming more and more accessible. This data will be used, but we now have the ability to shape the country’s present through “data-driven” decisions. Myanmar should look to successes and failures around the world and build upon them.
We’re working to make partnerships for data-driven solutions possible, but our efforts will be most successful with community engagement. We want to connect data providers, change agents, local and international media, and our own data researchers to create change for Myanmar. If this sounds exciting or interesting to you and you’d like to be a part of our matching program, please reach out to me or anyone from the Tech for Change team at Phandeeyar to see how you can get involved.