[Week 5 — Emotion Detection]

Published in

bbm406f18

2 min readDec 30, 2018

Last week, we have finished our progress report. And we decided to extend our dataset. Because our dataset was too small (1000 test data and 500 training data). Now we are working with data with 7667 different text (ISEAR Dataset).

But ISEAR Dataset has some different emotion labels. So we fixed and changed our data a little bit. For example, in ISEAR dataset there are some columns like ID, Country etc. But we only interested in with text expression and emotion label columns so we take data from only these columns.

And there were some texts which contain expression like “[No response]” etc. This kind of text has no value for our task. So we get rid of these lines too. Finally, this dataset contains 7 different emotions. We work with only 6 of them. We deleted expressions with that label too. And our dataset is ready to use.

This week we tried to increase our prediction rate with some preprocessing steps. And we used Stemming and some string operations.

Stemming is a natural language processing operation. Basically, transforms the word to its root.

>>> from nltk.stem.lancaster import LancasterStemmer
>>> stemmer = LancasterStemmer()
>>> stemmer.stem(“running”)
‘run’

If you want more detailed information, you can visit our 3rd-week blog post.

With stemming our prediction rates increased about 1–2%.

So here a jupyter notebook which you can take a look at what we have done this week.

[Week 5 — Emotion Detection]

Written by Ali Baran Tasdemir