More test scenarios for fake news detection.
Continued from Combating Fake NEWS with the help of Machine Learning.
To garner more confidence in the algorithm, I ran it on a variety of cases.
Note: In the following plots: X-axis represents the reliable sources and Y-axis represents the distance between reliable sources articles and query article. For the most related articles these distances should be lower and vice versa.
Also as we have seen in the previous post, if the value of distance is below 0.65, we can define a probability distribution of trust in the article. Whereas if the threshold is greater than 0.65, we can again define a probability distribution of mistrust in the article.
1. Right vs Left reports on the same news story
Most of the news sources are biased. Some try to neutralize it whereas some make money out of it. Usually, such sources write a genuine story but with a infuriating or misleading headline.
One such example is Political mayhem news. These news sources though report genuine news in the news body (although opinionated), trick us in the news headline. Because of the infuriating headlines, such articles get shared on the social media in abundance. One such example is:
Although the title is very misleading, the content is genuine. The source claims that the article is shared 1.1K times on social media. As we all know, when we come across such articles on social media, we do not have time to go through the whole article but rely mostly on the headline. In this scenario, since the body is genuine but the title is not, I decided to label them individually.
The above story is genuine and from the right wing media, we can also see that Fox news has many articles similar to this story.
2. When dicey sources publishes real news which is unorthodox.
When we stumble on news articles which are not visible in mainstream media and are also unorthodox, we usually ignore them taking them to be fake or hoax. But sometimes these stories are genuine news. These stories though published by the main stream media are not in focus because Donald Trump makes breaking news every day!
Such articles should not be assumed to be fake news. This problem can be avoided using this algorithm as it tags them as genuine news.
3. Articles from the new players in the industry
When news players like Vice News (launched in December 2013), publishes some news article, we are not very sure of the genuineness of the article. For example,
But when we run the algorithm for this article, we find that this is indeed true.
Hence slowly this can enable user to build their trust in such news sources.
4. Very recent news story
In case of a very recent news event, the algorithm tries to gather more proof as time passes. For example: On 8th March 2018, 21:12 PM CET, Trump turns the spotlight on violent video games, as reported by CBS and Reuters. But nothing was reported by most of the other reliable sources.
In such cases, when tested after some time, we can see that during the interval nearly all other sources reported similar stories.
5. Most hyped news
One of the most famous fake news article was published by WTOE 5 News. The article claimed that Pope Francis had broken with tradition and unequivocally endorsed Donald Trump for President of the United States. This turns out to be FAKE.
So when dailybeast published that Pope Francis said that Trump is not a Christian, it is very natural to be skeptical about the article.
Again, using this algorithm we can avoid prior bias for news articles.
6. News stories from tech section
The technology sector is also not exempted from the problem of fake news. Recently, there was an article which took real facts and concluded something ominous thus creating a hoax. They reported that concerned artificial intelligence researchers hurriedly abandoned an experimental chatbot program after they realized that the bots were inventing their own language.
In technical news articles, I rely on tech news sources such as:
techcrunch, verge, cnet, mashable, wired, thenextweb, engadget, techradar
Using these sources in my algorithm, I got the following result for the article:
Even with general news sources, the article was tagged as fake:
Conclusion:
Using this algorithm, we can generate a score of similarity between query article and articles from reliable sources. We can protect ourselves from fake news articles published on social media through some novice news sources, yet not miss the genuine articles from the same sources. This is especially beneficial if a new news source comes into the picture.
Also we can calculate how frequently and in what ratio it publishes genuine news and deceptive news. Keeping this as a reliability score, we can use this as a reliable source. Hence it can break the monopoly of brands like CNN.
Facebook’s measures to fight fake news by prioritizing feeds just from the reliable sources has effected the business of new generation of digital media companies like Vice, Vox.
“The news feed changes will have the most negative impact on publishers that rely primarily on Facebook for referral traffic and those companies that specialise in producing and distributing sponsored videos for Facebook,” says Christopher Vollmer, global entertainment and media advisory leader at PwC
Using such an approach, this can be avoided besides keeping the reader informed about the genuineness of the story.
If you have some interesting test case which can be used as a test case, please add it in the comments!