Sentiment analysis of the news

Published in

Strise

5 min readSep 6, 2018

My name is Stein-Erik, and I’m currently studying computer science at NTNU, specializing in artificial intelligence. Like Karen, I have been working at Mito.ai as a summer intern. It has been a wonderful experience and a very welcome one, seeing as I have been working at the same, non-tech, job for the previous six years. There was never a noticeable transition period between the two jobs in a social sense as I immediately felt welcome and a part of the team.

My first introduction to the system was a tour of the main project given by my tech lead, Patrick. We went through each of the top-level modules with him explaining their use cases, I was of course sitting there with my hands in my lap trying to tie all this together. To give you the basics, the system consists of modules with the collective task of processing news articles. An article is processed using several methods such as machine learning and natural language processing to extract and link entities in the text, categorize the article, and cluster it with similar ones. A module that was missing was sentiment analysis — and that is where I came in.

The initial vision was to create a service which would take the text from a news object and simply respond with the perceived sentiment (negative, neutral, or positive). Before putting too much effort into developing our own sentiment analysis, we decided to survey already existing third-party APIs. This was done to test how a service like this could work without having to tag a dataset and train our own model. This meant that I had to make a service to communicate with the external API, test it, and make an endpoint for our own API.

I started by making a utility object that would create a JSON request object from an article and handle the response (i.e. translate the result to negative, neutral or positive). Among others we have tested Google’s natural language API, TheySay, ParallelDots, and MeaningCloud. Every API presented their result in different ways, but they all had some form of positivity/negativity score and a magnitude or confidence rating. When this was well implemented and thoroughly tested, I moved on to create a service to communicate with the APIs and to test the sentiment service on some real data. The service was a real cakewalk as there were already methods to communicate with external APIs implemented, all I had to do was provide different authorization tokens etc.

With a service for tagging an article as positive, negative, or neutral, in place, I now needed to test it on a set of articles in the news stream to see if the results were desirable. A quick survey of the tagged articles showed that most of them were tagged as neutral, when they clearly were either positive or negative for the company in question.

Take a look at the picture above. A deal most likely falling through and a business resuming operation completely should be negative and positive, respectively. The results do not come as a huge surprise as these models had been trained to detect the emotions in a text, rather than the possible impact of it. For example, if one were to analyze the phrase “the movie was bad” the result most likely would be negative, as there is an indication of negative emotion towards the movie. If one instead tried to analyze something like: “On September 16th the stock market crashed” the result would most likely be neutral since it’s just a statement. However, if we add the phrase “…and that makes me sad”, to this last statement, yielding: “On September 16th the stock market crashed and that makes me sad”, it would probably be tagged as negative. As we can see, and as mentioned before, the models are trained to recognize the emotions of the writer in the text and not the implications of what the text is conveying.

We concluded that we had to train our own model to recognize the impact of a news article in a given category and, preferably, use an entity aware sentiment analysis. The reasoning behind running sentiment analysis in the context of an entity is that a company could be mentioned in an article with an overall negative sentiment, but the context in which the company is mentioned could be completely different. Let’s say we have an article concerning bankruptcy of company A. The article also mentions the opportunity created for company B — A’s competitor. The overall sentiment of this article could turn out negative as there is a lot of talk about bankruptcy. However, it’s only the impact for company A that is negative, and the article actually implies a positive impact for company B.

All things considered, this has been a very fun summer where I have learned a lot and I have thoroughly enjoyed my stay at Mito.ai. It presented some difficulties along the way, for example learning the programming language. Everything was to be made in Scala, a functional programming language I only had heard faint whispers about earlier. Seeing as I, up to that point, had only written code in a procedural fashion, this presented some challenges such as getting used to dropping the good reliable for-loop in favor of map or flatMap, working with types and pattern matching, and optional values propagating error handling to the outermost layer. With thorough guidance from the team, and some time to hack along myself, I got through it pretty well (I think) and now Scala would probably be my language of choice if I were to start a new project.

As for my thoughts about Mito.ai, I think that this is a wonderful place to work. They have a set of employees that are talented, encouraging and caring. This has turned the transition phase between jobs, which should have brought me out of my comfort zone, into a short trip between the old comfort zone and a new one.

Sentiment analysis of the news

Written by Stein-Erik Bjørnnes