Mining WeChat to Understand the Chinese Diaspora

Keith Ross, Leonard J. Shustek Chair Professor in Computer Science, develops new automated, scalable methodology for diaspora research

NYU Center for Data Science
Center for Data Science
3 min readApr 17, 2018

--

Diaspora research traditionally relies on data gathered through censuses and sample surveys — cumbersome methods hindered by bias, limited geographic scope, low response rates, and difficulty obtaining current information. The next wave of diaspora research, however, will rely on data from location-based apps. At the forefront of this wave is Keith Ross, who, with a team of researchers from NYU Shanghai, has imagined a new methodology for collecting and analyzing diaspora data.

Ross’s method looks to country-centric mobile apps (primarily used by residents of a specific country) for precise spatial and temporal data. Ross and collaborators present a case study for their method that examines the ethnic Chinese diaspora through WeChat (a messaging app with over 1.1 billion users), but they argue that the method can be applied to any ethnic diaspora using various country-centric apps.

For their case study, the researchers developed a scalable automated data collection technique that used GPS-hacking to leverage WeChat’s People Nearby feature. The automated method simultaneously captured user information from thirty-two cities on four consecutive Saturday afternoons in summer 2016. The obtained information included 6,308 distinct users, all within 2,000 meters of their respective town halls.

To verify users as ethnic Chinese, three human annotators manually examined each user’s language, posts, photos, and bios. Ross and collaborators note that users of WeChat’s People Nearby service are typically 18 to 30 years old, rendering People Nearby a particularly valuable data source for gaining insight into immigration activity of younger generations.

Along with their analysis of individual user data through WeChat, the researchers utilized Google Places API to identify the number of Chinese business establishments in the same thirty-two cities. Data regarding established businesses reflects immigration flows of older generations and allows useful comparison with recent immigration flows measured through WeChat.

The case study revealed that of the thirty-two cities sampled, the highest number of ethnic-Chinese WeChat users reside in Prato, Italy (an expected result given that Prato has the largest concentration of ethnic Chinese in Europe), and the lowest number reside in Anchorage, Alaska.

Interestingly, the WeChat sample yielded a relatively low number of ethnic Chinese users in San Francisco, a city where 21% of the population is of Chinese descent according to the 2010 U.S. census. The researchers attribute this aberration to the possibilities that many ethnic-Chinese people in San Francisco may be older than 40, live beyond 2,000 meters of town hall, or not use WeChat.

Despite any limitations, the WeChat case study reveals the value in mining country-centric apps to understand these emerging diasporic populations. As diaspora research continues, country-centric apps will continue to offer data that is more current, comprehensive, and accessible than traditional sample surveys or censuses.

By Paul Oliver

--

--

NYU Center for Data Science
Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.