Reel vs Real: Model training

3 min readMay 30, 2023

Model training for fake news detection involves the process of training a machine learning model to learn patterns and make predictions about the authenticity of news articles. The first step is to acquire a labeled dataset of news articles, where each article is associated with a binary label indicating whether it is real or fake news. The data is then preprocessed by cleaning and normalizing the text, removing irrelevant information, and converting it into numerical features. The dataset is split into training and testing sets, with the training set used to train the model and the testing set used to evaluate its performance.

Next, a suitable machine learning algorithm is selected, such as Random Forest, Naive Bayes, Support Vector Machines, or deep learning models like Convolutional Neural Networks or Recurrent Neural Networks. The model is trained using the training set, where it learns the patterns and relationships between the features and labels in the data. After training, the model’s performance is evaluated using metrics like accuracy, precision, recall, and F1-score, to assess its ability to classify fake and real news articles accurately.

To improve the model’s performance, hyperparameter tuning can be performed by adjusting parameters such as learning rate, regularization strength, or the number of trees in a Random Forest. Fine-tuning these hyperparameters can help optimize the model’s performance on the specific task of fake news detection. Once the model achieves satisfactory performance, it can be deployed to predict the authenticity of new, unseen news articles, providing a valuable tool for combating the spread of fake news.

Continuous evaluation and refinement of the model are essential to adapt to evolving patterns and tactics used in spreading fake news. Monitoring the model’s performance and incorporating new techniques or data sources can help maintain its accuracy and effectiveness over time. Additionally, it’s crucial to stay updated with the latest research and advancements in the field of fake news detection to improve the model’s capabilities and address new challenges that may arise.

Train-Test Split: Split the preprocessed dataset into training and testing sets. Typically, around 70–80% of the data is used for training, and the remaining portion is used for testing the model’s performance.

Following is a sample code of train test split and model training of the data for fake news detection

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset (assuming the dataset is in a CSV file)
data = pd.read_csv(‘fake_news_dataset.csv’)

# Split the dataset into features (X) and labels (y)
X = data[‘text’] # Textual content of the news articles
y = data[‘label’] # Binary labels: 0 for real news, 1 for fake news

# Perform train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an instance of TfidfVectorizer
vectorizer = TfidfVectorizer()

# Fit and transform the training set
X_train_tfidf = vectorizer.fit_transform(X_train)

# Transform the test set
X_test_tfidf = vectorizer.transform(X_test)

# Create an instance of RandomForestClassifier
random_forest = RandomForestClassifier()

# Train the model on the training set
random_forest.fit(X_train_tfidf, y_train)

# Make predictions on the test set
y_pred = random_forest.predict(X_test_tfidf)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the evaluation metrics
print(“Accuracy:”, accuracy)
print(“Precision:”, precision)
print(“Recall:”, recall)
print(“F1-score:”, f1)

Next, a TfidfVectorizer instance is created and fitted on the training set to transform the textual data into TF-IDF feature vectors. The same vectorizer is then used to transform the test set.

After that, an instance of RandomForestClassifier is created, and the model is trained on the training set using the fit() function. Predictions are made on the test set using the predict() function.

Finally, the model’s performance is evaluated using accuracy, precision, recall, and F1-score metrics calculated with the help of functions from scikit-learn’s metrics module.

The code was a simple example of implementation of fake news detection using Random forest classifier. Just like that we found the accuracy, precision, recall and F1 score of random forest, logistic regression and Gradient boosting algorithm. And found out that the Random forest algorithm gives the most accurate results.

That’s all for today folks lets discuss the next part in the next blog. Till then stay tuned!!

Reel vs Real: Model training

Written by Manasi Deshpande