UI + AI + News: Beyond #fakenews

Jarno M. Koponen
8 min readFeb 18, 2019

--

Background: I’m currently working on a new UI+AI initiative at Yle (The Finnish Broadcasting Company). It aims to augment the user-citizen, journalist and the newsroom to better understand the power and effects of algorithms in our everyday lives.

Below I’ve collected some benchmark articles, papers and tools related to fake news phenomena (examples, effects and detection solutions).

Starting with some historical context…

Fake news? That’s a very old story.
“Our own fake news purveyor, Paul Horner, suggests that Americans today are “definitely dumber” than they used to be. Perhaps. But we are not the only ones who fell for hoaxes, and American leaders — even ones we revere as Founding Fathers — were not above embracing such fabrications to shape opinion.”
https://www.washingtonpost.com/opinions/fake-news-thats-a-very-old-story/2016/11/25/c8b1f3d4-b330-11e6-8616-52b15787add0_story.html?noredirect=on&utm_term=.6a8c764d5c22

The real consequences of fake news
“A crucial part of that strategy should involve media literacy training and equipping news consumers with tools that will allow them to gauge the legitimacy of the news source, but also become aware of their own cognitive biases.”
http://theconversation.com/the-real-consequences-of-fake-news-81179

Why Fears of Fake News Are Overhyped
“Any conversation about fake news has to start with hard data on the extent of the problem. Unfortunately, these data are lacking. Most discussions of fake news exposure rely on simple counts of views or readers that lack essential context on who was exposed to the content and how frequently and what other information they also consume.”

“These findings do not alleviate every concern about fake news, of course.
1) First, even if relatively few people consume fake news, those consumers may be especially politically active and thus disproportionately influential in our politics.
2) Second, fake news is likely to have negative effects that extend beyond election outcomes.
3) Third, more needs to be learned about how to most effectively counter fake news. Providing online fact-checks reduces belief in headlines from these sites, but the scale of Facebook and other platforms outstrips the capacity of fact-checkers to keep up.
4) Fourth, relatively little is known about the effects of video. Amplification of extremism and false content is especially worrisome on YouTube given the amount of time some audiences spend on the platform and the way its algorithms may amplify misinformation (though YouTube now says it is trying to limit the reach of dubious content)”
https://medium.com/s/reasonable-doubt/why-fears-of-fake-news-are-overhyped-2ed9ca0a52c9

It will take more than NewsGuard’s team of journalists to stop the spread of fake news “Fake news — difficult to spot, even among educated people — is one of the biggest problems of our time. And, like most big problems, it has proven difficult to solve.

But numerous organizations are popping up to try to combat it. Their attempts generally fall into two buckets: 1) making people more news-literate, or 2) making news more trustworthy, by either weeding out fake news or providing information about a story or site’s reliability.”

https://www.recode.net/platform/amp/2019/2/13/18220746/real-journalists-fake-news-newsguard

Just Trust Us
“If ranking stories based on clicks or likes was disastrous, ranking online publishers’ credibility brings its own set of problems.”

“A world in which Facebook, Google, and Twitter give their users better and easier ways to evaluate news stories’ credibility seems like an improvement over the current free-for-all… It’s also possible to imagine a nightmare scenario in which the ratings authorities become too powerful, their subjective decisions baked into every algorithm and profoundly shaping what people read. Media companies would try to game the green shields the same way they gamed Facebook’s algorithm — or worse, curry favor or influence behind the scenes.”
https://slate.com/technology/2019/01/newsguard-nuzzelrank-media-ratings-fake-news.html

No, A.I. Won’t Solve the Fake News Problem
“Existing A.I. systems that have been built to comprehend news accounts are extremely limited… “ A.I. cannot fundamentally tell what’s true or false — this is a skill much better suited to humans.””
https://www.nytimes.com/2018/10/20/opinion/sunday/ai-fake-news-disinformation-campaigns.html

Facebook fake-news writer: ‘I think Donald Trump is in the White House because of me’
“I thought they’d fact-check it, and it’d make them look worse. I mean that’s how this always works: Someone posts something I write, then they find out it’s false, then they look like idiots. But Trump supporters — they just keep running with it! They never fact-check anything!”
https://www.washingtonpost.com/news/the-intersect/wp/2016/11/17/facebook-fake-news-writer-i-think-donald-trump-is-in-the-white-house-because-of-me/?utm_term=.c07f3f238106

Most Students Don’t Know When News Is Fake, Stanford Study Finds
“Teens absorb social media news without considering the source; parents can teach research skills and skepticism”
https://www.wsj.com/articles/most-students-dont-know-when-news-is-fake-stanford-study-finds-1479752576

I trained fake news detection AI with >95% accuracy, and almost went crazy
“I very quickly discovered that there are many different categories misinformation can fall into. There are articles that are blatantly false, articles that provide a truthful event but then make some false interpretations, articles that are pseudoscientific, articles that are really just opinion pieces disguised as news, articles that are satirical, and articles that are comprised of mostly tweets and quotes from other people.”

IMPORTANT FINDING! “So I really thought about what the problem was I was trying to solve. It then hit me; maybe the answer isn’t detecting fake news, but detecting real news. Real news is much easier to categorize. Its factual and to the point, and has little to no interpretation. And there were plenty of reputable sources to get it from.”
https://towardsdatascience.com/i-trained-fake-news-detection-ai-with-95-accuracy-and-almost-went-crazy-d10589aa57c

Tool: https://machinebox.io/docs/fakebox?utm_source=medium&utm_medium=post&utm_campaign=fakenewspost

Combating Fake News: A Survey on Identification and Mitigation Techniques
“We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions.”
https://arxiv.org/abs/1901.06437

Detecting fake news at its source
“Baly says the system needs only about 150 articles to reliably detect if a news source can be trusted — meaning that an approach like theirs could be used to help stamp out new fake-news outlets before the stories spread too widely.”
http://news.mit.edu/2018/mit-csail-machine-learning-system-detects-fake-news-from-source-1004

Predicting Factuality of Reporting and Bias of News Media Sources
“We have presented a study on predicting factuality of reporting and bias of news media, focusing on characterizing them as a whole. These are under-studied, but arguably important research problems, both in their own right and as a prior for fact-checking systems”

“We have created a new dataset of news media sources that has annotations for both tasks and is 1–2 orders of magnitude larger than what was used in previous work. We are releasing the dataset and our code, which should facilitate future research.”
https://arxiv.org/pdf/1810.01765.pdf

Automatic Detection of Fake News
First, computational linguistics can aide in the process of identifying fake news in an automated manner well above the chance level. The proposed linguistics-driven approach suggests that to differentiate between fake and genuine content it is worthwhile to look at the lexical, syntactic and semantic level of a news item in question. The developed system’s performance is comparable to that of humans in this task, with an accuracy up to 76%… Second, we showed that it is possible to build resources for the fake news detection task by combining manual and crowsourced annotation approaches.”
http://aclweb.org/anthology/C18-1287

Prior Exposure Increases Perceived Accuracy of Fake News
“Interestingly, however, we also find that prior exposure does not impact entirely implausible statements (e.g., “The Earth is a perfect square”). These observations indicate that although extreme implausibility is a boundary condition of the illusory truth effect, only a small degree of potential plausibility is sufficient for repetition to increase perceived accuracy. As a consequence, the scope and impact of repetition on beliefs is greater than previously assumed.”
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2958246

Fake News Detection on Social Media: A Data Mining Perspective
“In this article, we explored the fake news problem by reviewing existing literature in two phases: characterization and detection. In the characterization phase, we introduced the basic concepts and principles of fake news in both traditional media and social media. In the detection phase, we reviewed existing fake news detection approaches from a data mining perspective, including feature extraction and model construction. We also further discussed the datasets, evaluation metrics, and promising future directions in fake news detection research and expand the field to other applications.”
https://www.kdd.org/exploration_files/19-1-Article2.pdf

Understanding User Profiles on Social Media for Fake News Detection
“…we construct real-world datasets measuring users trust level on fake news and select representative groups of both “experienced” users who are able to recognize fake news items as false and “na¨ıve” users who are more likely to believe fake news. We perform a comparative analysis over explicit and implicit profile features between these user groups, which reveals their potential to differentiate fake news. The findings of this paper lay the foundation for future automatic fake news detection research.”
http://www.public.asu.edu/~skai2/papers/fake_news_user.pdf

Unsupervised Fake News Detection on Social Media: A Generative Approach
“We investigate the problem of unsupervised fake news detection on social media by exploiting the users’ unreliable social engagement information.”

“We extract the social media users’ opinions from their hierarchy social engagement information. By treating the truths of news and the credibility of users are latent random variables, a probabilistic graphical model is built to capture the complete generative spectrum. An efficient Gibbs sampling approach is proposed to estimate the news authenticity and the users’ credibility simultaneously. We evaluate the proposed method on two real-world datasets, and the experiment results show that our proposed algorithm outperforms the unsupervised benchmarks.”
http://www.public.asu.edu/~skai2/files/aaai_2019_unsupervised.pdf

“Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection
We collected a decade-long, 12.8K manually labeled short statements in various contexts from PolitiFact.com, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type.
https://arxiv.org/abs/1705.00648

4 Tips for Spotting a Fake News Story
“1. Vet the publisher’s credibility. 2. Pay attention to quality and timeliness. 3. Check the sources and citations. 4. Ask the pros.”
https://www.summer.harvard.edu/inside-summer/4-tips-spotting-fake-news-story

Related visualisation: https://infograph.venngage.com/p/179944/how-to-spot-fake-news

NewsGuard
“Our team of trained journalists and experienced editors is rating and reviewing thousands of news and information websites based on nine journalistic criteria — such as whether the site regularly publishes false content, reveals conflicts of interest, discloses financing, or publicly corrects reporting errors.”
https://www.newsguardtech.com/how-it-works/

News Literacy Project
“The News Literacy Project is a national education nonprofit offering nonpartisan, independent programs that teach students how to know what to believe in the digital age.”
https://newslit.org/

Valheenpaljastaja: Kuka perkaisi valeuutiset verkosta?
“Maailmaan mahtuu paljon puhetta feikkiuutisista ja niiden haitallisuudesta yhteiskunnalliselle keskustelulle ja päätöksenteolle, mutta konkretia on usein hukassa. Facebookilta ja Googlelta on jo pitkään vaadittu käytännön tekoja ongelman kitkemiseksi, mutta ne ovat olleet haluttomia sitoutumaan oikein mihinkään.”
https://yle.fi/aihe/artikkeli/2016/11/25/valheenpaljastaja-kuka-perkaisi-valeuutiset-verkosta

Disinformation/Fake news mapping
https://www.invid-project.eu/tools-and-services/invid-verification-plugin/
https://www.disinfo2018.com
https://securingdemocracy.gmfus.org/toolbox/authoritarian-interference-tracker/
https://botsentinel.com
https://hoaxy.iuni.iu.edu
https://fakey.iuni.iu.edu/
https://huhumylly.info/ (closed in 2016)
https://dashboard.securingdemocracy.org (v1.0 closed in 12/2018)

Fake News Generators
https://towardsdatascience.com/using-a-markov-chain-sentence-generator-in-python-to-generate-real-fake-news-e9c904e967e
http://tbsdaily.com/
http://noob.co.in/
https://breakyourownnews.com/
https://www.thefakenewsgenerator.com/
https://newspaper.jaguarpaw.co.uk/
https://www.fodey.com/generators/newspaper/snippet.asp
https://www.classtools.net/breakingnews/
http://www.addletters.com/newspaper-generator.htm#.XFl-Gs8vOb8

--

--

Jarno M. Koponen

Head of AI and Personalization. Yle, The Finnish Broadcasting Company. Interested in creating meaningful products with great people by combining UX and AI.