A Beginner’s Guide to Sentiment Analysis: Measuring Jane Eyre

Jen Chik
Analytics Vidhya
Published in
6 min readNov 4, 2019

The story begins with me studying Shakespeare. Just because, as a literature student, I thought it might be interesting to study the language of Shakespeare. My hypothesis was that the more deaths there were in a Shakespearean tragedy, the more conventionally “negative” words there would be. At this point in time, I had yet to hear of sentiment analysis, and started out with an Excel file containing all the lines of Shakespeare’s plays. After filtering for the tragedies, I created a visualisation of the most used words in Shakespearean tragedy:

The problem is obvious: the visualisation doesn’t say much. And, I, you, and that are words commonly used in everyday life, not just in Shakespeare. In other words, my visualisation proved nothing.

Given this, I started looking into ways to mine textual data better, which eventually led to me finding my way into sentiment analysis and tidytext.

What is Sentiment Analysis? Or Tidytext?

Sentiment analysis, as defined thanks to Google:

Essentially, it is a computational way of measuring the general sentiment of a text.

As for tidytext, it serves as a means of performing sentiment analysis. It orders text into a structure where each variable is a column, and each observation forms a row.

Here’s an example of how it appears in RStudio, after filtering for stopwords (the most common words used in language, such as the, is, which, and at):

Tidytext thus offered me the ability to avoid the situation I encountered when analysing Shakespeare’s plays, where the textual analysis was unable to filter out ubiquitous but words considered meaningless in this context.

My Journey with Sentiment Analysis

The only problem was that sentiment analysis and tidytext are all performed on R. Being a literature major, I had zero experience in coding (save for the html I mucked around with when coding blog skins was in fad), so attempting to parse the code felt like an impossible task—the words (and code) were all Greek to me. But I thought it honestly fascinating that I would be able to quantify the sentiment of a text, and so I buckled down to do whatever I could to understand the code.

For an entire week, I waded through tutorial after tutorial on sentiment analysis. I tried everything from Datacamp tutorials to asking everyone I knew to just trial and error. Eventually, I found a tutorial that made sense to me. Just as well, the tutorial was based on Shakespeare’s plays, and I copied the code over into RStudio to create my very first sentiment analysis plots.

comparison of words with positive and negative sentiment in Shakespeare’s sonnets
count of various sentiments in Shakespeare’s sonnets

Even just by following the tutorial, I felt so accomplished—it felt like I had conquered the impossible just by managing to use R as a humanities student.

Still, the code was static to me—I could not yet get it to do exactly what I wanted to do, even if I had managed to do sentiment analysis on Shakespeare, I could not apply it to other texts I wanted to analyse.

Fortunately, with enough sense (but mostly by trial and error), I figured out how to do it. The code relied on the “gutenberg_works” package, so by just replacing “Shakespeare, William” with “Brontë, Charlotte”, I was able to access Brontë’s works. Replacing it with the names of any author on Project Gutenberg also gives you access to the works of those authors!

the lines that define the which author’s texts are used in the code

I studied Jane Eyre for the ‘A’ Level English Paper 2 years ago. It was the longest book my teacher had set for us at over 350 pages, and I dreaded reading it. But I buckled down to read it one weekend, and instantly Lowood School and Thornfield Manor came alive. In short, I fell in love with Jane Eyre. So, what better text to do sentiment analysis on, right? Here are the graphs I churned out, using the very same code:

a similar comparison of positive and negative sentiment in Jane Eyre
a similar bar graph of the various sentiments present in Jane Eyre
the progression of sentiment through Jane Eyre

For me, what was interesting was seeing how accurate the sentiment analysis was. The Lowood school experience, for the first 10 chapters of Jane Eyre, is actually reflected in the graph at the beginning with almost no positive sentiment at all.

You can see how the positive and negative sentiments are identified in these samples from Chapter 1 and 7 respectively (where positive sentiment is indicated with pink highlights and negative sentiment is indicated with purple):

Obviously, there are certain issues with the identification of sentiment. As seen in Chapter 1, “liked” is used in the phrase “never liked”, thus considering it as a positive sentiment is somewhat questionable. There are other issues as well:

the top contributors to each sentiment in Jane Eyre

This graph makes clear a flaw in the sentiment analysis—the package counts “miss” as a negative word. While that may well be the case in its usual usage, the word “miss” in Jane Eyre is often used in its noun form, as a title prefix: Miss Abbot, Miss Eyre.

Miss Abbot, Miss Eyre: the usage of “miss” in Jane Eyre

Upon further research, I found that there was a way to filter certain words—for instance, in this case the word “miss” can be removed. However, it does not solve the issue that certain words can be mislabelled as positive or negative in certain contexts, such as when sarcasm is used, or a negative word modifies the meaning of a positive one (I’m referring here to the example of “never liked”, as mentioned earlier).

My main point is that sentiment analysis is not without its faults. Still, this does not negate its usability as a tool for an aggregate understanding of the sentiment of a text. Used with caution, it proves both helpful and interesting.

Personally, this journey through sentiment analysis and RStudio has been an journey of growth in my technical skills, and has increased my confidence in using coding software. If I do encounter a situation where using a package on R may be useful, I will definitely be more open to look into it. If anyone has any advice, suggestions, or even questions about sentiment analysis, do feel free to let me know and I will be glad to discuss it with you!

--

--