How can AI help you listen to the open web ?

Some interesting (and recent) Machine Learning based studies on Social Media and other Internet websites

Karna

Published in

KARNA AI ( Market Research Division of ParallelDots )

8 min readMar 5, 2017

Internet -a powerful medium to check the world’s pulse

Internet has become an essential tool for expressing ideas, marketing and connecting with people. It’s a market, town hall and postage department packed in the language of 0's and 1's that computers understand. But there is another (rather unappreciated) use social media and Internet has, and that is getting the world’s pulse. This tweet by Jack Dorsey after the 2017 Joint-session speech is an excellent example of using Twitter as a listening tool.

With this blog post, we summarize some studies on social media and other internet websites that demonstrate how internet can be a powerful tool when looking for meaningful insights about the real world. There is a vast ocean of data out there and we need the recent advances in AI to summarize the signal in a form humans can consume. Please note that a lot of the studies are pre-prints and might not be accepted in the current form, but the idea is to talk about the possibilities rather than doing a review.

Analyzing mood of the masses

You can predict referendums using Twitter Sentiment. The balance of twitter sentiment (difference between positive/negative sentiment) could have predicted whether UK was to leave the EU. All other predictors (opinion polls/political pundits) were pointing to a different conclusion. At Karna-AI, we gauge the trends in the mood of the masses through plots where you can visualize sentiment distributions on different topics determined using Deep Learning algorithms. We can even go a level deeper by breaking sentiment into emotions (sadness and anger both count as negative sentiment but evoke different feelings) as we did in our latest study of Oscars 2017. In this study, we noticed a clear spike in ‘sad’ tweets when the in-memoriam was going on and a noticeable jump in ‘angry’ tweets when the best picture was incorrectly announced.

Peak in sad tweets around the in-memorium and angry tweets when the best picture was incorrectly announced.

Similarly, an analysis of twitter activity during Irish marriage referendum shows some very interesting results. They have an imperfect sentiment scoring (not using Machine Learning or Deep Learning, but just based on occurrence of positive/negative words), hence they work at a bigger picture, dividing the entire Twitterati around the referendum into multiple communities and looking at the aggregate sentiment, which strongly points at “Yes” becoming the likely outcome of the referendum.

Identifying communities connected by common ideologies

There are other interesting results that show very less interaction between politically different communities and people form ‘ghettos’ on twitter according to their political ideologies. Karna-AI has tools to segment online users and find out the influencers in a network, so as to figure out the important topics that defines that network.

Political ideologies tend to form large sub-networks within twitter network

In this study, the authors show that just modelling what the twitter connections of a user are talking about, his/her stance towards any issue can be determined, even if the user is dormant. It is a proven fact now that computers can analyze humans better than other humans do. So AI algorithms can either profile the user themselves or their neighbourhood into categories and we can easily deduce their stance from these categories. We are very soon launching a product at KarnaAI to model social media users’ profiles and ability to group and filter users.

Profiling a user and/or their network is a good metric to measure their stance

Finding those needle is a haystack

While these studies work on the big picture, there is another important aspect we might want to listen to. These are minority opinions, rare events or anomalies. They are all effectively text classification problems in Machine Learning. One requires special text classifiers to work on datasets where one has to classify these rare data points from common ones, as searching for “needle in haystack” gives a lot of noise (irrelevant data). These researchers discover positivity on twitter (which they find to be very rare, <5%) using cascaded Machine Learning algorithms. Cascaded algorithms are a series of algorithms which are trained to correct errors of the preceding ones. Its not just limited to positivity, we can listen to people talking about their pregnancies and can be used to look for depression symptoms from among population.

Mental forums can be analyzed by AI warning users of potential episodes

Combining the network information and text information and training AI can even help generate automatic contextual responses like the one mentioned in this YikYak based study. Emojis, another component of social media can also be predicted using AI on text interactions.

Identify how propaganda originates and spreads

Another area of interesting research is spread of propaganda and fake news. This analysis of ISIS accounts on twitter before getting suspended gives some very interesting insights. It seems small sub-networks of 100–1000 accounts follow each-other and tweet/re-tweet common generated content to increase their influence. Using bots for re-tweets is another possible conclusion. This paper also suggests a fast algorithm to block out people who are major nodes of such propaganda networks to stop spread of what it calls “cyber-epidemics”. This paper gives a simple method to dig out communities that are not as obvious.

While we are talking about social media, fake news (and biased non-fake news) is also a problem generally on the open web. This paper suggests a method to group all coverage of a news story by different sources and let social activists compare the different versions and share whenever they see bias. The authors used Google to get different coverage of the same news story, but that can be better achieved by Karna-AI’s Machine Learning based news analytics tools as well to automate this process. Talking of social unrest, there is work on automatically listening to agitations being planned and discussed on twitter. It kind of works in a similar fashion as “needle in haystack” problems above.

Twitter network of #occupywallstreet movement

Analyzing and enriching discussions

I have been just talking about what can twitter be used to deduce about the real world, but that is not the only Internet medium that gives us signals. Another important medium is discussion forums. The first question is what discussion forums to consider ? A study on Facebook discussions gives us some signals for this. It turns out that high quality comments attract other high quality comments. While this conclusion looks pretty obvious, the study proves that by displaying comments based on social feedback (likes in case of Facebook) instead of the most recent comment, better quality comments are attracted to the study, hence making the quality better gradually and avoiding the spam, troll or hateful comments. Also, the dynamics for posts with too many comments are different as spam or hurtful comments does not stop people from coming in and expressing their opinion. Extensive use of Machine Learning capabilities is needed for such studies where comments need to be graded for quality constantly in an automated fashion. At Karna-AI, we have our own intent classifier which filters user comments into opinion/feedback/marketing etc to grade comments and filter irrelevant comments.

You can make your forum better just by showing high voted comments

Machine Learning algorithms can also be used as a triage for mental health forums giving alerts about deteriorating condition of users of these forums. Citation networks of scientific publications can be analyzed using Machine Learning to show inherent bias and help counter such disadvantages of minorities and women. Data Mining on the same networks can help us trace citation cartels, which cause trivial publications to come out citing other trivial publications.

Paper citation networks can be analyzed for some great insights

Not just text, even images yield insights

With the advances in image Deep Learning methods, image based Internet media like Pinterest, Instagram and E-Commerce websites can also be mined for some brilliant insights. For example, this study from FeiFei Li’s group uses Image AI capabilities to read Google Streetview images and perform a survey of cars in United States. It can count what models of cars it sees and can make a report without any human work. Any type of survey based on observations can hence be automated such as race distribution of a locality, amount of traffic on a street, ratio of kids to adult and similar stuff which till now requires lot of manual effort. There is another study on Social Media images that can help identify events like festivals and concerts and other such things directly from images shared on social media. Our team has done a similar study that identifies broad trends regarding how people interact with #gopro on Instagrm.

This very interesting study takes food photos from Instagram, uses AI to classify these images and then compare the food articles seen from images of a county to common food related problems in that same county (like alcohol overdose and diabetes).

This study similarly uses image classification ability to classify clothes shown on e-commerce websites into categories and then mine fashion trends over time.

Clothing trends can be analyzed using AI

Join us in pushing the boundaries further

In a nutshell, the open Internet gives us extensive opportunity to feel the pulse of the world around us. AI algorithms have matured enough to act as an assistant to make sense of the insane amount of unstructured data generated on internet everyday.

Karna-AI is a product that helps you derive such insights from sources like news, blogs, forums and social media which can be used for market research, reputation management or competitive bench-marking . It generates automated AI based reports that can be used to get quick feedback on your social media strategy. Do you think there is something you might want Karna AI to track for you ? Get in touch with us at contact@paralleldots.com