Twitter Sentiment Analysis with NodeXL Pro

Matt
5 min readOct 11, 2016

--

Let’s dive into the pit of despair. Let’s see what Twitter thinks about Donald Trump (#topical). Are you ready? I’m ready.

Fire up NodeXL Pro. You might want to read my prior piece about mapping to get a handle on the basics of ingesting data and limitations, but for our purposes this is pretty easy.

For this, I’m more interested in Tweets that include the term “realdonaldtrump,” than I am with him and his follower network, so we’re going to go to Import:

And select “From Twitter Search Network,” which will get you this:

Lord help us

That’ll chug for a while depending on what search term you’ve decided to use. Remember that you can use Twitter’s advanced search operators here, so you can pop in something like “q=realdonaldtrump%20near%3A”Washington%2C%20DC”%20within%3A15mi&src=typd&lang=en” to pull Tweets near Washington, DC that include the string “realdonaldtrump.”

This is as good a point as any to call your attention to Twitter’s built-in sentiment analysis tools — :), :(, and ?. They do basically what it says on the tin, but they’re also kind of a black box. Doing this in NodeXL gives you finer control.

In any case, you‘ll end up with something that looks like the following:

38000 rows, are we big data yet

Once the data is imported, head to Graph Metrics, like so:

And scroll down to check the box by “Words and Word Pairs” in the box that pops up.

Head into options, and select the radio button next to “On the Edges worksheet,” and choose “Tweet.” NodeXL will look in this column for word sentiment and word pairs. By default, it uses the Opinion Lexicon, but if you’re using this to identify a particular word or phrase that’s positive or negative, or a colloquialism that means something unique, you’ll have to manually add that to the list.

A brief math interlude, with very little math: “salience” basically calculates how “important” the word is — how often it shows up for the entire column, and “mutual information” looks at word pairs and calculates how related that pair of words is. Small numbers are uncommon word pairs, larger numbers are more common.

An even briefer linguistics interlude — I know even less about this than I do about math, fair warning: we’re looking at word pairs here, which are “bigrams.”

Take a sentence like “The McCourt School of Public Policy is the ninth and newest school at Georgetown University.” Subtract out the filler words (“the,” “of,” “and,” “at”), and put the rest of those words in a bag.

In that bag, you have 10 unigrams. “McCourt,” “School,” “Public,” “Policy,” “is,” “ninth,” “newest,” “school,” “Georgetown,” “University.”

You have 9 bigrams: “McCourt School,” “School Public,” “Public Policy,” “Policy is,” “is ninth,” “ninth newest,” “newest school,” “school Georgetown,” “Georgetown University.”

You have 8 trigrams: “McCourt School Public,” and so on. Grouping words into larger sets gives you more information, but you get to the point where you’re getting into unique phrases very quickly, and unique phrases are hard to analyze.

There’s a ton to explore in computational linguistics and natural language processing, but we’re trying to get the basics. Once your metrics have finished calculating, you’ll get two new sheets: Words and Word Pairs.

The Words sheet, shown below, is where you can see the per-word sentiment analysis.

For “realdonaldtrump,” the words flagged as positive and negative sentiments are roughly equal for this sample of Tweets 10,310 to 10,558, except the word “trump,” standalone, is mentioned 1,533 times and is part of the positive sentiment list. Whoops. Subtracting 1,533 from 10,310 gives us 8,777 to 10,558 for a final positive to negative sentiment ratio of .83.

Now if you’ll excuse me, I’m gonna go home and put some water in Buc Nasty’s momma’s dish.

The Word Pair sheet shows us the most common bigrams in this sample of tweets:

“rt” and “realdonaldtrump” is the most common word pair — for folks that are still manually retweeting, bless their hearts. It might be worthwhile to trim out any pair that includes obvious usernames. With 28 days left in the election, “28” shows up a couple of times. “Rigged” and “system,” and “donate” and “big” both show up several hundred times, showing you roughly where the people mentioning “realdonaldtrump” have their heads at, here on October 10, 2016.

That’s how you do sentiment analysis using NodeXL. Pretty straightforward, and the insights are a little more digestible than eigenvectors and measures of centrality when trying to explain mapping. The other nice part is that this data is much friendlier to exporting to Tableau, PowerBI, or whatever your data visualization tool of choice is, and can offer a pretty rich set of information about what people actually think about your search phrase of interest is.

--

--