sumup.ai, June 2019
More than 80% of the world’s data is in the form of unstructured text, growing at 50% per year. This presents both an opportunity and a challenge for professionals. On the one hand, more data means more information; on the other, more data also means more work. Professionals in information-focused roles such as investors, intelligence analysts, lawyers, and researchers spend at least a third of their day reading. Yet, they also wear many other hats including providing recommendations, making decisions, and working with clients. An overwhelming majority of professionals express frustration towards the financial and opportunity costs, bias, and incompleteness of analyzing large volumes of data.
The field of text analytics, a blend of Natural Language Processing (NLP), Computational Linguistics, and statistical machine learning, is a rapidly evolving area within Artificial Intelligence. Its goal is to help information-focused professionals to process and leverage unstructured data. Advances in text analytics — particularly in algorithms, computing capability, and storage infrastructure — now enable professionals to overcome many of the hurdles associated with unstructured data and to turn them into useful insights and information.
The text analytics platform developed by sumup.ai, Nucleus, enables professionals to engage with large-scale, unstructured data in real-time. As an example, we analyzed Trump’s Twitter data since the beginning of his presidential term — approximately 8,000 tweets since January 2017 — and share some of our findings.
The data is available here, and the interested reader can reproduce our results on our platform. Including the time to load the data-set into Nucleus’ platform, all the analyses can be reproduced in less than 20 seconds.
- Extract most discussed topics & sentiment: According to the top 5 most prevalent topics, Trump dedicated a whooping 36% of his tweets to the topic of fake news, while 18% of tweets relate to the White House. Trump also wrote a significant amount about Hillary Clinton and the southern border — in a negative tone. He also wrote about tax cuts very positively.
- Identify & isolate contributing sources: Using our platform, we can also quickly isolate the tweets that contribute to the topic and sentiment scores above.
A fake news tweet: “ Can you believe that with all of the made up unsourced stories I get from the Fake News Media together with the $10,000,000 Russian Witch Hunt (there is no Collusion) I now have my best Poll Numbers in a year. Much of the Media may be corrupt but the People truly get it!” — President Trump
3. Analyze historical topic trends: Historically, the topic of fake news is prominent and prevalent— above 25% on average — among his tweets (left: graph 1). The topic of border security has trended-up in prevalence over the past year — from about 10% to 30% of mentions(left: graph 2). On North Korea, the topic spiked-up — from less than 10% to north of 25% — between February and April 2019 surrounding the North Korea and United States Hanoi Summit which occurred at the end of February (left: graph 3).
4. Measuring concensus: Concensus measures the degree of agreement among the authors on the topic — in this case how consistent were Trump’s views on a given topic. We queried the keyword “illegal” and came across 15 tweets with 62% concensus related to the topics of:
[Trump-campaign] [Trump-collusion] [campaign-Russia] [intelligence-committee]
We were also interested in his view on “unemployment” and found 3 tweets and 100% concensus.
Tweet 1 on key word “unemployment”: Jobs are up unemployment is at record lows and wages are still rising.
Tweet 2 on key word “unemployment”: Our country is doing better than ever before with unemployment setting record lows.
Tweet 3 on key word “unemployment”: RT @GOPChairwoman: We’ve added 6 MILLION jobs since @realDonaldTrump’s election unemployment is at record lows wages are on the rise
The use-cases of text analytics are vast — from understanding trending topics in Trump tweets, analyzing the public’s sentiment around presidential candidates, finding investment signals from text, streamlining content monitoring processes, connecting medical/legal research — and growing. More interestingly, recent advancements in research and technology enable businesses to use text analytics in a flexible and transparent manner, at large-scale, and in real-time. Businesses and researchers alike can now analyze the persistency or ephemerality of specific themes in large, complex bodies of unstructured text.
Learn more about what Nucleus’ text-analytics platform can do for your business. Visit our website at www.sumup.ai.