Going Viral: Designing a machine learning model for social activist engagement.
This blog is part of a series of posts that I have been writing in my research and study of International User Experiences and Machine Learning pipelines led by Dr. Saiph Savage at University of Washington. We study the design of intelligent interfaces employing machine learning models, and the development interaction across different cultures and locations with distributed teams.
In this post, I am designing and discussing strategies for social networks and machine learning pipelines that can be used to evaluate whether an online campaign has the potential to go viral. For the purpose of context, the model will target union Latino workers organizing campaigns against unfair labor conditions, abusive bosses and employees. A labor union is an organized association of workers formed to protect and further their rights and interests with their employers. A union organizing campaign can essentially be a drive or movement to defend rights and or protest an unfair situation. The term “viral” in the social media driven world refers to the tendency of an image, video or piece of information to be circulated rapidly and widely from one Internet user to another.
We live in a digital society now and the adoption of smartphones and apps has been widely spread even among Latino populations in developed and developing countries. This fact gives an opportunity to employ social networks such as Facebook, Tweeter, and sites such as coworker.org to advocate for better workplace conditions employing video, text and audio and gather data from supporters. Blogs in union organizations can be useful to deliver more extended information on campaigns to target audiences.
Motivations and Strategies
Organizing labor campaigns can follow the lead of successful commercial and political campaigns by employing data-mining tools and machine learning to identify, engage, and influence customers and voters. Unions could employ social networks, ML and big-data to analyze and spread a piece of content to other users with emotion when they see it, prompting them to continue sharing the content in turn. Usually, it is the content that evokes emotions characterized by activation, positive or negative, that has the potential to become viral.
Key aspects to promote a social activism campaign go viral can include:
- The campaign is personal, and about an individual representing other people in the same circumstances. People are more likely to support an individual tied to a cause than a cause that affects many individuals.
- Making the campaign socially interactive in which participants can see how the challenge performed and how other people were challenged to do it.
- The campaign is driven by a credible organization.
- Keep the information simple so mass amounts of people will be willing to take part in it.
Organizing people online would be particularly effective with Millennials, whose presence in the labor market is very significant and also supportive of unions. Young people view labor unions more favorably than business corporations based on this research on pewresearch.org
Mapping out the Machine Learning Pipeline and Social Network Platforms Integration
The machine learning pipeline and model outcome will be a recommendation engine that evaluates similar campaigns and assesses which information elements of an are circulating rapidly and widely, and what other elements may need change, or simply are not working out.
Data will be compiled from campaigns with similar purposes on different social networks to identify common headlines, keywords in text, engagement metrics such as likes, retwittes, images, and collected signatures. Social network APIs can be used to extract raw data such as:
- Facebook pages with posts and associated likes and dislikes
- Trending topics in Twitter with votes, likes, dislikes
- Twitter retwittes
- Conversation text in general.
Data Preparation and Cleaning
During this phase, the raw data which can be collected in text based files such as CSV, JSON format, and just text will need to be parsed and normalized so that it can be stored in a data warehouse that is flexible to perform searches and make sense of the data. All text needs to be normalized by removing stop words, removing hashtags, emojis, etc. from social media posts. Related hashtags along with their corresponding trending metrics such as dislikes and likes counts for a given activism campaign will be included. Subjective expressions including beliefs, opinions, and views are retained.
The data content cleaned and stored in the data warehouse will be analyzed for sentiment behaviours. People’s feelings and their actions cannot solely be organized into a binary category: positive or negative such as a like or dislike in a twit or Facebook post. We need more context to truly understand how people feel about a piece of content. Sentiment analysis is used on collected social media data to better understand how people feel about a piece of content. Each campaign found can be represented as a vector denoting geographical locations, sentiment scores, retweets counts, and social media sharing counts.
Trained Machine Learning Model
The machine learning will predict whether a campaign has the potential to become viral based on the sentiment scores in data points, number of people, and geographic distribution. The model will employ supervised learning with data modeled as a classification problem. The polarity of sentiments based on trained data sets will determine the sentiment scores.
Success in the model is determined by successfully predicting whether a campaign will become viral, a true positive, and predicting when a campaign won’t become viral, a true negative.