We are currently attending the 13th International Conference on Web and Social Media (ICWSM) in Munich. We are here to present our latest work titled “The Language of Dialogue Is Complex” (a brief description is here) and to catch up with friends and colleagues. On that note, let me say that the legendary Science Slam which took place the night before the conference — drinks, short talks, drinks — was a success! Kiran (@gvrkiran) will organize next year’s science slam, and I bet he will be looking for help ;)
To get a quick ideas of what kind of papers were presented, please have a look at the proceedings of the conference.
The first keynote of the conference was by @ladamic and was divided in three parts:
What does the geographic distribution of your Facebook friends suggest? The socio-economic conditions of the area you live in! Apparently, according to a study done by Facebook folks, friend geographic distribution changes according to the extent to which your area is socio-economically deprived:
- Marcus (@mstrohm) gave another keynote (slides are here). His talk showed how homophily influences the social networks of minorities (for more, here is a paper he co-authored with @clauwa and others). He presented one work that was cleverly matching survey data with online data — check this work out.
In addition to keynotes, there were a few talks of interest:
- Social media platforms are generally studied during “normal times”. The paper Participation of New Editors after Times of Shock on Wikipedia set out to fix that (@cerenbudak @danielmromero). They studied what happens to Wikipedia contributions in times of “shocks”. To identify shocks (i.e., “events that may trigger a rapid increase in attention to Wikipedia articles” ), the authors focused on Wikipedia articles about “people since these articles are susceptible to changes when individuals are involved in events such as deaths, scandals, or electoral victories.” Google Trends (GT), “a tool that provides a time series of Google search volume for a query relative to the maximum search volume on a given time interval” was used to spot temporal anomalies. Statistically, the authors performed time series decomposition using Loess (STL), which is a technique that decomposes an observed time series into several components (e.g., time trend and seasonality).
- Teens are bullied and could well turn to avatars to voice their emotions which are hard to express.
A work titled “Self-Disclosure of Bullying Experiences and Social Support inAvatar Communication: Analysis of Verbal and Nonverbal Communications” by two Japanese researchers showed just that. From a mobile app, they collected the following dataset
They found that bullied children expressed their negative emotions in private rooms and, as a consequence, these children ended up expressing gratitude for the support received.
- Say that you are willing to contribute to wikipedia articles but you don’t know where to start. That’s mainly because Wikipedia does not have a recommender system — it does not offer a service that recommends you pages you might like to contribute to. EPFL researchers (@cervisiarius et al.) built such a system, which also works for new users (individuals for whom no historical preferences are available). Their system will show you multiple pairs of lists and, for each pair, will ask you: “Between lists A and B, which one contains more articles that you would be interested in editing?” Based on your answers, it will recommend you articles you might like. The paper is titled “Eliciting New Wikipedia Users’ Interests via AutomaticallyMined Questionnaires: For a Warm Welcome, Not a Cold Start”.
- At ICWSM, researchers can share their datasets, and that’s what a paper titled “Sharing Emotions at Scale: The Vent Dataset” does: “33 millions of posts [from an app called Vent] by nearly a million users together with their social connections. Each post has an associated emotion.There are 705 different emotions, organized in 63 emotion categories, forming a two-level taxonomy of affects.”
- Take a phrase talking about abortion. Say that it contains both positive and negative stances, which are not easily extracted by text mining algorithms. Researchers have designed a method that is able to extract both positive stances and negative stances from phrases. The paper is titled “PhAITV: A Phrase Author Interaction Topic Viewpoint Model-for the Summarization of Reasons Expressed by Polarized Stances” (@ozaiane).
- It’s hard to extract conflicts among communities on social media. A couple of researchers did so for Reddit communities (subreddits) in “Extracting Inter-Community Conflicts in Reddit” (@eytanadar). The idea is to 1) Identify the users whose content tend to be up-voted (users engaging in compliant behaviour) and those whose content tend to be down-voted (users engaging in norm-violating behaviour); 2) Label a user who reliably produces enough measurable norm-compliant behaviour (e.g., many upvoted messages) as having a social home in that community, and label a user who produces a substantial amount of measurable norm-violating behaviour (i.e., many downvotes) as having an anti-social home in that community; and 3) Place a conflict edge between two communities A and B, “if there are many controversial authors that have a social home in subreddit A and anti-social home in subreddit B (“the sum of all these edges, after some additional filtering, captures the conflict graph”). That’s a simple and smart way of identifying conflicts among communities. The network datasets are publicly available at https://github.com/srayandatta/Reddit.
- The paper “Smart, Responsible, and Upper Caste Only” (@r_ashwin and
@david__jurgens) won “the best paper” award. It measures “caste attitudes through large-scale analysis of matrimonial profiles”. The researchers found that individuals open to intercaste marriage are younger, and are “more individualistic in the qualities they desire, rather than favoring family-related qualities”. These results came from mining preference statements found in profiles upon a dictionary of words the researchers built.
- Mingyang Li presented his work comparing the use of emojis in the East and that in the West. He mentioned a very interesting website in case you are interested in collaborating with him. In parallel, they have done a nice analysis of emotions comparing China with US.
- @sharathguntuku analyzed images to predict depression and anxiety. The paper is titled “What Twitter Profile and Posted Images Reveal about Depression and Anxiety”. It turns out that users suffering from depression tend to post grayscale and low-arousal images.
- The paper “Understanding and Measuring Psychological Stress Using Social Media” (@sharathguntuku @anibuff @jeichstaedt) “deployed a survey on Qualtrics (a platform similar to Amazon Mechanical Turk),comprising several demographic questions (age, gender, race,education, and income) and the Cohen’s 10-item Stress scale”. “Out of all users who took the survey, 601 users completed the survey and had active accounts with more than 900 words on both Facebook and Twitter. We collected their Facebookposts by using the Facebook Graph API and downloadedtheir Twitter posts using the Twitter API.” What those suffering from stress and those who don’t talk about on social media?
- The researchers then applied the language models (built on the 601 user profiles) to tweets across the whole US. The goal was to measure county-level stress trends. For that analysis, they excluded counties with less than 100K words or tweets, I’m not sure which :)
- Another way is to train a classifier on data coming from Reddit communities discussing a disease (positive training examples) and data coming from communities that have nothing to do with health (negative training examples). That’s the way the paper “A Social Media Study on the Effects of Psychiatric Medication Use” used (@kous2v @munmun10 @emrek). This is an award-winning paper.
In addition to talks, there were tutorials, and one was about how to mine public groups on WhatsApp (@gvrkiran kindly shared his slides here).