Natural Language Processing is hard. There’s no easy way to compile a large database of words and assign sentiments to them accurately because we all think about the meaning of words slightly differently (thus, a reliable sentiment score will be based on the average of many different individuals’ sentiment scores). Moreover, these datasets struggle to keep pace with the constantly evolving use of the English language, adding another layer of difficulty as slang, idioms, and phrases also provided significant sentiment to a sentence. For the sake of simplicity here, I’m only considering two pre-built python packages that can perform sentiment analysis: TextBlob and VADER.
The New York Times developer mode is not new, but its comments section API is. Today, we perform a start to (not quite) finish exploratory analysis of the API in order to create a list of trending New York Times articles ranked by how controversial their reception is.
To start, we’ll first need to create a New York Times developer account and then sign up for an API key that we can use to send post requests. This is fairly straightforward based on the instructions on their website, so I’ll gloss over it for now. In order to ensure that the comments are as recent as possible, I’ve opted to use the Most Popular API in order to look at the most viewed articles in the last 24 hours.
From here, we’ll convert the results into a Pandas dataframe using the built-in function pd.read_json(). The JSON data gives us lots of extraneous information we don’t really need for this, so I’ll drop all columns except for the url, date, keywords, title and abstract. Finally, we’ll convert our dataframe to a csv file that gives us a list of the top 20 most viewed articles per day.
Now that we have our list of articles that we want to pull comments for, we can use the Community API. In order to do this, we’ll iterate through the rows in the dataframe we created, pulling the top 25 comments from each url. The path to actually get the comments is slightly more complex for this API, but by iteration we can easily pull the lists of comments and then attach them to our new dataframe.
Finally, we’re ready to start doing sentiment analysis. We can start by importing both TextBlob and VADER (Yes, they both sound like pop culture villains) and then constructing a new analyser object.
Now, we can run both of our sentiment analysis tools against our comments in our dataframe. Notably, the naivebayespolarity sentiment score and the vaderpolarity compound sentiment score are often slightly different. Why?
All sentiment analyzers for NLP have to be trained on a dataset of words that have preassigned sentiment scores (hundreds of these datasets are available here). The TextBlob sentiment analyzer is trained on a canonical corpus of text called the Movie Reviews corpus that assigns sentiment scores from -1 being most negative to 1 being most positive based on keywords such as “good” and “bad”. The VADER sentiment analyzer is slightly more nuanced, adding boosting sentiment scores for emphasis signals such as “!” and assigning a greater sentiment to “GREAT” than to “great”. So, how did they perform?
Well, the simple answer is that they work okay for extracting general sentiment. Anything beyond that, such as sentiment strength, is (in my opinion) highly unreliable. This failure to recognize and properly assign sentiment scores is partially attributable to sentiments hinging on words that are political colloquialisms. Attached below is a particularly noteworthy comment from a recent NYT article.
“We have a deranged lunatic for a President who is unrepentant over the nation-wide police suppression of peaceful protests, the horror, he, himself, caused with his defiance of Constitutional law and over-the-top desire to squash all oppositions to his tyrannical governing. As a confirmed bigot, he freely admitted his admiration of members of a white supremacy group. His desire to rule by force, duplicity, and defiance of Constitutional law is proven by his daily tweets, the past 3 years of gutting the government from within, his attempted friendships with fellow tyrants, Vladimir Putin and Kim jong-un, his firing of all those employed to provide oversight, and his claim to be above the law. He said so and he committed three years proving it. We, the People want clear air and water, our right to exist, the right to have a credible President who adheres to Constitutional law — not a crook/bigot/sleaze/cheat/scammer who disregards the working class and those who need a helping hand. Want a 6 pm curfew, no job, a dim future, and a President who wants your vote and then your disappearance? That’s Trump’s platform.”
The VADER sentiment analysis score for this clearly negative comment was completely neutral (“ ‘neu’: 1.0 ”), while the TextBlob Naive-Bayes sentiment analysis score was even more inaccurate, reporting a slightly positive score of .06. This result was, well… disappointing. Personally, it seems as though the results were poor not because the sentiment analysis wasn’t working, but because the sentiment analyzers were analyzing the wrong words. Many of the New York Times readers are writing — and responding — political content. In this sphere, words and phrases such as “Putin” and “voting for” (or its negation “not voting for”) are far more important to analyze than “good” and “bad”. Here’s an even more clearcut example: someone who calls a policeman a “pig” is expressing an obviously negative sentiment, but both analyzers would skip that word and count it as neutral based on the farm-animal definition listed by Merriam-Webster.
At the end of the day, these results clearly leave us with much to desire. It’s important to note that there are hundreds of different sentiment analysis training data sets available online, and moving forward hopefully this idea can hopefully be refined and improved upon (perhaps with a database of political phrases that have recently come to light in the Age of Twitter Politics). Moreover, the nature of this project as an exploratory analysis demonstrated the abilities and shortcomings of the python packages outlined above. Overall, the New York Times API along with the TextBlob and Vader sentiment analysis tools are incredibly powerful; hopefully my next article will document more of their successes than their failures.