More test scenarios for fake news detection.

Continued from Combating Fake NEWS with the help of Machine Learning.

Vishwani Gupta
6 min readMar 9, 2018

To garner more confidence in the algorithm, I ran it on a variety of cases.

Problem of fake news

Note: In the following plots: X-axis represents the reliable sources and Y-axis represents the distance between reliable sources articles and query article. For the most related articles these distances should be lower and vice versa.

Also as we have seen in the previous post, if the value of distance is below 0.65, we can define a probability distribution of trust in the article. Whereas if the threshold is greater than 0.65, we can again define a probability distribution of mistrust in the article.

1. Right vs Left reports on the same news story

Most of the news sources are biased. Some try to neutralize it whereas some make money out of it. Usually, such sources write a genuine story but with a infuriating or misleading headline.

One such example is Political mayhem news. These news sources though report genuine news in the news body (although opinionated), trick us in the news headline. Because of the infuriating headlines, such articles get shared on the social media in abundance. One such example is:

Although the title is very misleading, the content is genuine. The source claims that the article is shared 1.1K times on social media. As we all know, when we come across such articles on social media, we do not have time to go through the whole article but rely mostly on the headline. In this scenario, since the body is genuine but the title is not, I decided to label them individually.

Result: Genuine news but check the Title.

The above story is genuine and from the right wing media, we can also see that Fox news has many articles similar to this story.

2. When dicey sources publishes real news which is unorthodox.

When we stumble on news articles which are not visible in mainstream media and are also unorthodox, we usually ignore them taking them to be fake or hoax. But sometimes these stories are genuine news. These stories though published by the main stream media are not in focus because Donald Trump makes breaking news every day!

Result: Genuine news and correct title

Such articles should not be assumed to be fake news. This problem can be avoided using this algorithm as it tags them as genuine news.

3. Articles from the new players in the industry

When news players like Vice News (launched in December 2013), publishes some news article, we are not very sure of the genuineness of the article. For example,

But when we run the algorithm for this article, we find that this is indeed true.

Result: Genuine news and correct title

Hence slowly this can enable user to build their trust in such news sources.

4. Very recent news story

In case of a very recent news event, the algorithm tries to gather more proof as time passes. For example: On 8th March 2018, 21:12 PM CET, Trump turns the spotlight on violent video games, as reported by CBS and Reuters. But nothing was reported by most of the other reliable sources.

In such cases, when tested after some time, we can see that during the interval nearly all other sources reported similar stories.

Result: Genuine news and correct title but need more proof (8th March, 21:12 CET)
Result: Genuine news and correct title (9th March, 0:17 CET )

5. Most hyped news

One of the most famous fake news article was published by WTOE 5 News. The article claimed that Pope Francis had broken with tradition and unequivocally endorsed Donald Trump for President of the United States. This turns out to be FAKE.

So when dailybeast published that Pope Francis said that Trump is not a Christian, it is very natural to be skeptical about the article.

Result: Genuine news and correct title

Again, using this algorithm we can avoid prior bias for news articles.

6. News stories from tech section

The technology sector is also not exempted from the problem of fake news. Recently, there was an article which took real facts and concluded something ominous thus creating a hoax. They reported that concerned artificial intelligence researchers hurriedly abandoned an experimental chatbot program after they realized that the bots were inventing their own language.

In technical news articles, I rely on tech news sources such as:

techcrunch, verge, cnet, mashable, wired, thenextweb, engadget, techradar

Using these sources in my algorithm, I got the following result for the article:

Result: Fake news and incorrect title

Even with general news sources, the article was tagged as fake:

Result: Fake news and incorrect title

Conclusion:

Using this algorithm, we can generate a score of similarity between query article and articles from reliable sources. We can protect ourselves from fake news articles published on social media through some novice news sources, yet not miss the genuine articles from the same sources. This is especially beneficial if a new news source comes into the picture.

Also we can calculate how frequently and in what ratio it publishes genuine news and deceptive news. Keeping this as a reliability score, we can use this as a reliable source. Hence it can break the monopoly of brands like CNN.

Facebook’s measures to fight fake news by prioritizing feeds just from the reliable sources has effected the business of new generation of digital media companies like Vice, Vox.

“The news feed changes will have the most negative impact on publishers that rely primarily on Facebook for referral traffic and those companies that specialise in producing and distributing sponsored videos for Facebook,” says Christopher Vollmer, global entertainment and media advisory leader at PwC

Using such an approach, this can be avoided besides keeping the reader informed about the genuineness of the story.

If you have some interesting test case which can be used as a test case, please add it in the comments!

--

--

Vishwani Gupta

Applied machine learning enthusiast. Trying to make a difference in this society.