Week 1 — Hate Speech Detection on Social Media
Hi everyone. We are Ege Çınar, Gökhan Özeloğlu, and Yiğit Barkın Ünal. This is an introduction to our BBM406 Machine Learning Project. We’re going to have a look at hate speech on social media.
What is trolling?
Trolling is defined as deceptive and disruptive behavior which usually involves posting inflammatory and malicious comments to provoke and upset people. Trolls are very widespread on social media sites. While trolling is certainly unhelpful it doesn’t count as hate speech.
What is hate speech?
Hate speech is defined as hateful comments directed to specific groups or targets. It can be discriminative towards a persons' race, gender, religion or sexual orientation. Hate speech can lead to an exchange of insults in online threads resulting in emotional distress. Hate speech can scare away quality discussion. Finally, hate speech can lead to radicalization of individuals and increase polarization of different groups.
While hate speech is not something most social media platforms want there can be a very fine line between preventing hate speech and censoring legitimate opinions. Also, hate speech can be hard to detect as trolls usually use a sarcastic language.
Our project
Our project is detecting hate speech on social media using machine learning techniques. We will compare and contrast different approaches such as SVMs.
Datasets
We have four different datasets at hand. They have hate speech content from twitter, Wikipedia discussions, and other forums.
https://data.world/crowdflower/hate-speech-identification
https://github.com/leondz/hatespeechdata
https://github.com/pinkeshbadjatiya/twitter-hatespeech
Related Works