Detecting Fake News with Python

Do you trust all the news you read and listen to?

Afolabioluwafemi
Hamoye Blog
2 min readJun 3, 2021

--

By BGT-02 Group, Junior Data Analyst Interns at Hamoye.

This article is about how to use computational thinking with python to identify fake news as an application of our first module of learning at the Hamoye internship programs

What is Computational Thinking?

Computational thinking is a process of solving problems, which includes; formulating problems to use a computer and other tools to solve the problem, logically organize and analyze data, representing data through abstractions such as models and simulations, automated solutions.

Through the concepts of computational thinking, we will break down the problem of fake news, observing patterns in what and how fake news spreads, finding out what causes these patterns, and building a set of instructions for solving the problem of identifying fake news.

What is Fake News?

Fake news encapsulates pieces of news that may be deceiving and is generally spread through social media and other online media. This is often done to impose certain ideas and is often achieved with political agendas.

The problem of fake news is not only hackers going into accounts and sending false information. Fake news is those news stories that are false; the story itself is fabricated with no verifiable facts, sources, or quotes.

When someone or bot impersonates someone or a reliable source to spread information that can also be considered fake news. In most cases, the people creating this false information have an agenda that can be political, economic or to change the behaviour or thought about a topic.

Steps to detecting fake news using python

1. Make necessary import

Import numpy as np

Import pandas as pd

From sklearn.utils import shuffle

2. Load data into python

Fake = pd.read_csv(“C:\\path\filename.csv”)

True = pd.read_csv(“C:\\path\filename.csv”)

3. Track fake and real news:

Fake[‘target’] = ‘fake’

True[‘target’] = ‘true’

4. Concatenate the data frames:

Data = pd.concat([fake, true]).reset_index(drop = True)

5. Shuffle the data to prevent bias:

Data = shuffle(data)

Data = data.reset_index(drop=True)

6. Data exploration:

Identifying articles per subject;

print(data.groupby([‘subject’]) [‘text’].count())

data.groupby([‘subject’]) [‘text’].count().plot(kind=”bar”)

plt.show()

Identifying the number of fake and real articles;

print(data.groupby([‘target’]) [‘text’].count())

data.groupby([‘target’]) [‘text’].count().plot(kind=”bar”)

plt.show()

--

--