Week 1 — Hate Speech Detection on Social Media

Published in

bbm406f19

2 min readNov 30, 2019

Hi everyone. We are Ege Çınar, Gökhan Özeloğlu, and Yiğit Barkın Ünal. This is an introduction to our BBM406 Machine Learning Project. We’re going to have a look at hate speech on social media.

What is trolling?

Trolling is defined as deceptive and disruptive behavior which usually involves posting inflammatory and malicious comments to provoke and upset people. Trolls are very widespread on social media sites. While trolling is certainly unhelpful it doesn’t count as hate speech.

What is hate speech?

Hate speech is defined as hateful comments directed to specific groups or targets. It can be discriminative towards a persons' race, gender, religion or sexual orientation. Hate speech can lead to an exchange of insults in online threads resulting in emotional distress. Hate speech can scare away quality discussion. Finally, hate speech can lead to radicalization of individuals and increase polarization of different groups.

While hate speech is not something most social media platforms want there can be a very fine line between preventing hate speech and censoring legitimate opinions. Also, hate speech can be hard to detect as trolls usually use a sarcastic language.

Our project

Our project is detecting hate speech on social media using machine learning techniques. We will compare and contrast different approaches such as SVMs.

Datasets

We have four different datasets at hand. They have hate speech content from twitter, Wikipedia discussions, and other forums.

https://data.world/crowdflower/hate-speech-identification

https://github.com/leondz/hatespeechdata

https://github.com/pinkeshbadjatiya/twitter-hatespeech

aitor-garcia-p/hate-speech-dataset

Disclaimer: The number of files available in this repository may be slightly different to the numbers reported in the…

github.com

Related Works

Week 1 — Hate Speech Detection on Social Media

aitor-garcia-p/hate-speech-dataset

Disclaimer: The number of files available in this repository may be slightly different to the numbers reported in the…

Written by Ege Çınar