Machine Learning and the VP Debate

Using a similar approach to my Twitter analysis here, I analyzed tweets from last nights VP debate with the Cloud Natural Language API, BigQuery, and Exploratory for visualization. This time, in addition to running syntax analysis on every tweet I also used the NL API’s sentiment analysis feature:

Twitter sentiment during the debate

The NL API returns two values for sentiment: polarity and magnitude. polarity is a number from -1 to 1 indicating how positive or negative the text is. magnitude indicates the overall strength of the statement regardless of whether it is positive or negative, and is a number ranging from 0 to infinity. A good way to gauge sentiment is to multiply the two values so that statements with a stronger sentiment (higher magnitude) are weighted accordingly.

var searchTerms = '#debates,#debates2016,#debatenight,#vpdebate,Mike Pence,Tim Kaine';
SELECT 
LEFT(STRING( SEC_TO_TIMESTAMP(INTEGER(created_at )/1000)),16) as minute,
AVG(FLOAT(polarity) * FLOAT(magnitude)) as sentiment
FROM [sara-bigquery:syntax.vpdebate]
GROUP BY 1
ORDER BY 1
Twitter sentiment during the October 4th VP debate
SELECT 
ROUND(AVG(FLOAT(polarity) * FLOAT(magnitude)),2) as overall_sentiment,
COUNT(*) as num_tweets
FROM
[sara-bigquery:syntax.vpdebate]
Overall sentiment for tweets during the VP debate
WHERE LOWER(text) CONTAINS 'tax'
Sentiment for tweets containing ‘tax’

Syntactic analysis

Using the NL API’s text annotation method, we can break down a tweet by parts of speech and use BigQuery to find linguistic trends. For each sentence, the NL API will tell us which word is the subject (labeled as NSUBJ). Since I’ve got the JSON response from the NL API saved in BigQuery, I can write a user-defined function to find the top subjects in tweets about the VP Debate:

SELECT 
COUNT(*) as subject_count, subject
FROM
JS(
(SELECT tokens FROM [sara-bigquery:syntax.vpdebate]),
tokens,
“[{ name: ‘subject’, type: ‘string’}]”,
“function(row, emit) {
try {
x = JSON.parse(row.tokens);
x.forEach(function(token) {
if (token.dependencyEdge.label === ‘NSUBJ’) {
emit({ subject: token.lemma.toLowerCase() });
}
});
} catch (e) {}
}”
)
GROUP BY subject
ORDER BY subject_count DESC
LIMIT 100

Top debate emojis

Last but not least, how did people express their feelings about the debate in emojis? Here are the results in an emoji tag cloud:

Top emojis used in tweets about the VP debate on Oct 4th

What’s next

Have questions or more ideas for natural language processing? Find me on Twitter @SRobTweets or let me know what you think in the comments. And here’s are the tools I used:

Connoisseur of code, country music, and homemade ice cream. Helping developers build awesome apps @googlecloud. Opinions = my own, not that of my company.

Connoisseur of code, country music, and homemade ice cream. Helping developers build awesome apps @googlecloud. Opinions = my own, not that of my company.