Deep Sentiment Analysis

Exciting new research from Intellogo

Neil Balthaser
Intellogo
4 min readFeb 22, 2017

--

We’ve been working on some exciting new research at Intellogo: sentiment analysis on very short form content. By very short form we mean a paragraph of text (or around 500 characters). This is a sweet spot because lots of reviews and social media posts are around this size. Intellogo was designed to excel at deep sentiment analysis on longer form content — in fact it is one of the only text analytic platforms that is capable of analyzing whole books. But a paragraph has far less features than an entire book. How will Intellogo perform against it?

To find out, we took the following paragraph from an article in the Telegraph and ran it past both Intellogo and IBM Watson. Our goal is to see how we compare against one of the leaders in text analytics for short form content. Here is the snippet we ran past both systems:

Nasa will hold a press conference on Wednesday to present a “discovery beyond our solar system,” leading to speculation that the announcement will involve planets capable of sustaining life. The agency has offered no details on the upcoming presentation other than that it will involve “exoplanets”. Astronomers have been studying such planets, which orbit stars other than the sun, for clues as to whether, and where, life could exist beyond the earth. Nasa has analysed dozens of planets that orbit suns.

What do both A.I. systems think the snippet is about?

Both Intellogo and IBM Watson performed well here. Both correctly identified the snippet to be about “planet” or “exoplanet” and “Nasa”. IBM Watson also identified “solar system” which is great but “Jupiter” is a head scratcher. Intellogo goes a lot deeper and sees that the snippet is also about “general science” and “space”. What’s intriguing is that it goes deeper still to tag “history’s mysteries” because there is a search for answers and “philosophy” and “god and spirituality” because it talks about “whether, and where” life could exist beyond the earth. Also, Intellogo sees “dynamic creative problem solving” because the snippet talks about Nasa having spent time analyzing dozens of planets.

Overall, Intellogo is more nuanced and demonstrates a deeper understanding of what the snippet is actually about.

“About” tag cloud: On the left in red are the concepts which Intellogo has tagged. On the right in blue are the concepts and entities which IBM Watson has tagged. The size of the word indicates the relative weight of that tag in the text. For example, Intellogo rates “Space” as the most important “about” concept while IBM Watson rates “Planet” as the most important concept. With IBM Watson concepts and entities may overlap and in these cases entities were always chosen over duplicate concepts.

What sentiments do the two A.I. systems identify?

IBM Watson only identifies “joy”, “anger”, “disgust”, “fear” and “sadness” in text. Only “joy” rated above 50% which we assume is the threshold for the sentiment being relevant. Intellogo on the other hand also identified “happy” as a prevailing sentiment but went further and tagged the snippet as “amazing”. “Progressive” as in “moving forward and advancing; continuing steadily by increments” certainly makes sense. We’re not sure why “Unifying” is called out although it somehow makes sense given the nature of the announcement. We’ll look into that and report back.

“Sentiment” tag cloud: On the left in red are the sentiments Intellogo has tagged. On the right in blue is the sentiment IBM Watson has tagged.

What speaking tones do the systems pick up?

Speaking tones are important because they allow us to read between the lines of what is being said. In other words, it’s not just what is being said, it’s how it’s being said. In this case, Intellogo finds the speaking tone to be “intelligent”, “interesting” and “thought-provoking”. Certainly all those apply. The “intimate” one is a head scratcher. We’re investigating why Intellogo finds the snippet intimate. It could be that the snippet talks about “holding” and “life”… or it could be that we need to retrain it on the concept. That’s why we call this stuff research. IBM Watson does not do any tone analysis.

“Tone” tag cloud: On the left in red shades are the speaking tones Intellogo has tagged. IBM Watson doesn’t recognize tone so there are no tags that it can generate.

What points of view do the systems see?

Like tone, points of view gives us deeper insight into what is being said. In this case, Intellogo flags “scientific” as the main point of view but it also gets the ethical, cultural and philosophical points of view that come with talking about life beyond our own planet. “Psychological” is another head scratcher. It could be that Intellogo is keying off of “analysed” and “studying” but that shouldn’t be enough to make it think the point of view is “psychological”. We’ll dig into that one and report back. IBM Watson unfortunately doesn’t analyze points of view.

“Point of view” tag cloud: On the left in red shades are the points of view Intellogo has tagged. IBM Watson doesn’t recognize points of view so there are no tags that it is able to generate.

All in all we’re quite happy with our first foray into shorter form content. The example snippet is pretty short yet Intellogo mined quite a bit of accurate and useful data from it. The fact that so many insights are working with short form content means that we did a good job training the system. Still there is more work to do as we want to discover how short the content can go before Intellogo doesn’t have enough features to do a good job classifying it.

--

--

Neil Balthaser
Intellogo

As a kid, I loved to build robots. Robots in kits and robots out of stuff in my bedroom. Today, I’m fortunate enough to build them for a living.