Visualizing Sentiment in the News

Sam Couch
3 min readSep 14, 2018

--

A great way to quantify the words people say and keep a pulse on general opinion is to monitor sentiment. A major pain point as a developer, however, is first, how do you scrape and follow every outlet and blog online? Second, how do I then perform the analysis on the text? In this post, I’ll walk through how to accomplish both with minimal overhead.

Identifying Stories

In general, this is a tricky problem. First of all, you have to identify each of the outlets you want coverage over. Of course, there are the large outlets, but how about the small and medium-sized outlets? To have the best sample size, you need to be able to be able to maximize the number of different sources you track. You could use something like a news radar that adds to an RSS feed whenever a topic is mentioned, allowing you to scrape those, or perhaps you could monitor social outlets like Reddit and Twitter to identify when a story is shared on a particular topic, but all of these add additional overhead to your system. A way to drastically simplify all of this is to utilize a tool like Watson Discovery News. Instead of having to manually identify outlets and stories and then scrape them (and parse the text), Discovery News does all of this for you. Every day Watson adds around 300,000 new documents from a multitude of different sources and topics. Not only does Watson scrape the stories, but it parses the documents and gives you access to enriched data on each. The service can extract metadata such as concepts, entities, keywords, sentiment, emotion, and much much more for each document by utilizing natural language processing (NLP). Since Watson Discovery News is already aggregating the data, all you have to do is write a query to retrieve the documents that you want! In a query, you can be as specific or broad as you want, and merely analyze the documents it returns from the search.

Using the Data

Once you’ve identified a query that retrieves the most useful documents for your analysis, all you have to do now is work with the metadata that you want. In this case, iterate over each document and retrieve the sentiment score. You can make pretty visualizations with the data that give great insights into the topic that you’re investigating, and with minimal effort!

Visualizing the sentiment of Bitcoin news

I’ve recently published a walkthrough of how to do this on your own, and I’d love for you to take a look! One of the limitations of Watson Discovery News, however, is that it only allows you to query documents from the last 60 days. A great workaround is to use something like CRON to automate your query once per day and storing the metadata yourself for longevity.

Make sure to tap the 👏 button and share your thoughts with me on twitter — @samuelcouch!

--

--