Sentiment Analysis on Fake News Detection

Srujanakoripalli
5 min readDec 3, 2021

--

Now a days everyone are spending time on the smart phones extensively and reading the news articles or information on the social media. Social media such as Facebook, Twitter, You tube, Twitter, Instagram, Whats App, LinkedIn, Tik Tok etc. But does this all platforms provides the Source of truth ?

From more than a decade, we have seen the increasing number of fake news on the social media. Most of these are created with the intention of the action of deceiving someone or attracting some one. These inaccurate information can be misleading in many number of times. These kind of news affects the social-well being of the people.

The fake news writers uses different tricks to promote their news with one of them being to excite the sentiment of the readers. This has mainly led to the Sentiment Analysis in determining the sentiment expressed in the article headings which is used in the fake news detection. It determines the emotion or the attitude of the writer.

Below is the example of Sentiment Analysis on Fake News Detection using python in Google Colab

Install the libraries such as spaCy , en_core_web_lg, en_core_web_sm

  • spaCy is an open-source software python library used in advanced natural language processing and machine learning. It will be used to build information extraction, natural language understanding systems, and to pre-process text for deep learning.
  • en_core_web_lg is the largest English model of spaCy with size 788 MB. There are smaller models in English and some other models for other languages
  • en_core_web_sm is a small English pipeline trained on written web text (blogs, news, comments), that includes vocabulary, syntax and entities.
  • NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.
  • Stop words are those words that do not provide any useful information to decide in which category a text should be classified
  • Beautiful Soup is a Python library for pulling data out of HTML and XML files
  • averaged_perceptron_tagger is used for tagging words with their parts of speech (POS)

Collected the data from Kaggle Fake News Data and read into the Pandas Data frame

Data preprocessing

  • Keeping the Negative Stop words for Sentiment Analysis

Data Cleaning- Writing functions for removing html tags, accented characters, special characters

  • Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

Normalizing the text required by using all the above cleaning functions

For Sentiment Analysis detection, we use one of the important library called as AFINN for finding the score , category

  • Afinn is the simplest yet popular lexicons used for sentiment analysis. It contains 3300+ words with a polarity score associated with each word. In python, there is an in-built function for this lexicon.
  • Sentiment Category can be categorized in to Neutral, Negative and Positive.
  • We can also determine the score of the category by initializing with the values.

We can get the plot using the seaborn

  • Seaborn is an open-source Python library built on top of matplotlib. It is used for data visualization and exploratory data analysis. Seaborn works easily with data frames and the Pandas library. The graphs created can also be customized easily.

We get the following plot

  • Text Blob a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
  • The Sentiment function of Text blob returns the two properties Polarity and Subjectivity.

Now we can do the Confusion on the AFINN and the TEXTBLOB by from sklearn. Metrics import confusion_matrix

Hence, detecting the Sentiment Analysis is a very crucial thing for the well being of the people by avoiding the trap of going into extreme discomforts.

References

--

--