Fake News Detection using Machine Learning

Ponshriharini
featurepreneur
Published in
2 min readMar 12, 2022

First, we’ll be importing all the necessary libraries.

import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

Now, we’ll read the data into a pandas dataframe.

df=pd.read_csv('news.csv')
df.head()

We’ll get the output feature now. Here, the output feature is the label which return FAKE or REAL.

labels=df.label
labels.head()

We’ll now be splitting the data into training and testing data.

x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)

We’ll now be initializing tfidfvectorizer. If you don’t know how a tfidfvectorizer works, refer to this article. After initializing them, we are using Fit and transform on train set and transform on test set

tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)

tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)

Now, we’ll be initializing PassiveAggressiveClassifier and use it to predict on the test set and calculate accuracy. PassiveAggresiveClassifier keeps the model as it is if the prediction is correct and if not, it makes changes to the model in order to correct it.

pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

We’ll now be building a confusion matrix to give us some information

confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])

As you can see, the number of true positives and true negatives are very large compared to false positives and false negatives. This means that our model works pretty fine.

Happy coding !

--

--