Maltese Sentiment Analysis

Dawson Camilleri
Nerd For Tech
Published in
3 min readFeb 8, 2022

Natural Language Processing For A Low Resource Language

In 2021, I explored the idea of creating a sentiment analysis that works in Maltese. In this article, I shall describe the challenges encountered and advantages of such a system.

What is sentiment analysis?

Sentiment analysis is the process that is done by a machine to automatically detect whether a text is positive or negative. However, the labels to classify the sentiment are not only limited to positive or negative as various categories have been done such as: Subjective, objective, sarcastic, neutral, mixed, etc. Sentiment analysis is usually done in English because finding data sets in other languages can be quite difficult and tedious.

Why is it important?

Sentiment analysis provides a powerful method to capture the customers emotion about the company’s products so they can tailor their business to what the people want. In addition, time is saved since comments can be in the thousands and an overall summary can be very useful instead of checking and reading every review manually. Various sectors use sentiment analysis for their day-to-day operations such as: Health care, Finance, Hospitality, etc.

The challenges?

Filtering is very important when creating a sentiment analysis as some words do not contain emotion, they should not be included during the training. Some words that need to be removed are: Localities, country names, pronouns, numbers, etc. Certain punctuation marks can be removed but in some cases it makes sense to retain them for example using the exclamation mark could convey a strong sentiment. Certain intentional spelling mistakes could also convey strong sentiment such as “I looove it”.

Collecting a suitable data set can be tricky since some approaches use crowd sourcing which is when a group of participants work together to provide data. Crowd sourcing can be challenging as finding volunteers to collaborate might be very difficult since people are busy in their day to day lives. However, news articles or comments can be a great source to gather data as it provides a constant flow of data since they are uploaded daily. Furthermore, since various people comment their text could contain a mix of spelling mistakes, punctuation marks, and abbreviations which is great for testing.

Once the data is processed correctly a feature set must be chosen, a feature set ‘ranks’ a word accordingly so the word ‘best’ might be high ranking in one feature set but might be low ranking in another. Some example of feature sets that are used are TF-IDF or Doc2vec. Finally, a suitable algorithm must be chosen to give out the sentiment such as Random Forest or SVM.

Several other challenges were encountered when specifically creating one for Maltese such as: Identifying Maltese idioms, dialectic differences between localities, and finding a large data set to train the AI. I plan to explore possible solutions for these challenges in the future by looking at previous works that were done for other low resource languages.

In conclusion, Maltese sentiment analysis is a new research area that is still open for many exciting projects. For anyone that is interested in creating a sentiment analysis, services such as monkeylearn provide a platform to create one without any coding knowledge.

--

--

Dawson Camilleri
Nerd For Tech

I am a master’s student of artificial intelligence at the university of Malta. I also work as a technology consultant with EY Malta.