How OpenAI’s Fake News Warnings Triggered Actual Fake News

PCMag
PC Magazine
Published in
6 min readApr 3, 2019

Which is the bigger threat: fake news generated by AI or fake news about AI? At least for the moment, we should be more worried about the latter.

By Ben Dickson

Nonprofit AI research lab OpenAI caused a wave of AI apocalypse panic last month when it introduced a state-of-the-art text-generating AI called GPT-2. But while it celebrated the achievements of GPT-2, OpenAI declared it would not release its AI model to the public, fearing that in the wrong hands, GPT-2 could be used for malicious purposes such as generating misleading news articles, impersonating others online, and automating the production of fake content on social media.

Predictably, OpenAI’s announcement created a flood of sensational news stories, but while any advanced technology can be weaponized, AI still has far to go before it masters text generation. Even then, it takes more than text-generating AI to create a fake-news crisis. In this light, OpenAI’s warnings were overblown.

AI and Human Language

Computers have historically struggled to handle human language. There are so many complexities and nuances in written text that converting all of them to classical software rules is virtually impossible. But recent advances in deep learning and neural networks have paved the way for a different approach to creating software that can handle language-related tasks.

Deep learning has brought great improvements to fields such as machine translation, text summarization, question answering, and natural language generation. It lets software engineers create algorithms that develop their own behavior by analyzing many examples. For language-related tasks, engineers feed neural networks digitized content such as news stories, Wikipedia pages, and social media posts. The neural nets carefully compare the data and take note of how certain words follow others in recurring sequences. They then turn these patterns into complex mathematical equations that help them solve language-related tasks such as predicting missing words in a text sequence. In general, the more quality training data you provide to a deep-learning model, the better it becomes at performing its task.

According to OpenAI, GPT-2 has been trained on 8 million web pages and billions of words, which is much more than other, similar models. It also uses advanced AI models to better apply text patterns. Sample output from GPT-2 shows that the model manages to maintain coherence in longer sequences of text than its predecessors.

But while GPT-2 is a step forward in the field of natural-language generation, it is not a technological breakthrough toward creating AI that can understand the meaning and context of written text. GPT-2 is still employing algorithms to create sequences of words that are statistically similar to the billions of text excerpts it has previously seen — it has absolutely no understanding of what it’s generating.

In an in-depth analysis, ZDNet’s Tiernan Ray points to several instances where GPT-2’s output samples betray their artificial nature with well-known artifacts such as duplication of terms and lack of logic and consistency in facts. “When GPT-2 moves on to tackle writing that requires more development of ideas and of logic, the cracks break open fairly wide,” Ray notes.

Statistical learning can help computers generate text that is grammatically correct, but a deeper conceptual understanding is required to maintain logical and factual consistency. Unfortunately, that is still a challenge that current blends of AI have not overcome. That’s why GPT-2 can generate nice paragraphs of text but would probably be hard-pressed to generate an authentic longform article or impersonate someone in a convincing way and over an extended period of time.

Why AI Fake-News Panic Is Overblown

Another problem with OpenAI’s reasoning: It assumes that AI can create a fake-news crisis.

In 2016, a group of Macedonian teens spread fake news stories about the US presidential election to millions of people. Ironically, they didn’t even have proper English skills; they were finding their stories on the web and stitching disparate content together. They were successful because they created websites that looked authentic enough to convince visitors to trust them as reliable news sources. Sensational headlines, negligent social-media users, and trending algorithms did the rest.

Then in 2017, malicious actors triggered a diplomatic crisis in the Persian Gulf region by hacking Qatari state-run news websites and government social media accounts and publishing fake remarks on behalf of Sheikh Tamim bin Hamad Al Thani, the Emir of Qatar.

As these stories show, the success of fake-news campaigns hinges on establishing (and betraying) trust, not on generating large amounts of coherent English text.

OpenAI’s warnings about automating the production of fake content to post on social media are more warranted, though, because scale and volume play a more important role in social networks than they do in traditional media outlets. The assumption is that an AI such as GPT-2 will be able to flood social media with millions of unique posts about a specific topic, influencing trending algorithms and public discussions.

But still, the warnings fall short of reality. In the past few years, social media companies have been continually developing capabilities to detect and block automated behavior. So a malicious actor armed with a text-generating AI would have to overcome a number of challenges beyond creating unique content.

For instance, they would need thousands of fake social media accounts in which to post their AI-generated content. Even tougher, to make sure that there’s no way to connect the fake accounts, they would need a unique device and IP address for each account.

It gets worse: The accounts would have to be created at different times, possibly over a year or longer, to reduce similarities. Last year, a New York Times investigation showed that account creation dates alone could help discover bot accounts. Then to further hide their automated nature from other users and policing algorithms, the accounts would have to engage in human-like behavior, such as interacting with other users and setting a unique tone in their posts.

None of these challenges are impossible to overcome, but they show that content is only one part of the effort needed to conduct in a social media fake-news campaign. And again, trust plays an important role. A few trusted social media influencers putting up a few fake news posts will have a greater impact than a bunch of unknown accounts generating large volumes of content.

In Defense of OpenAI’s Warnings

OpenAI’s exaggerated warnings triggered a cycle of media hype and panic that, ironically, bordered on fake news itself, prompting criticism from renowned AI experts.

Zachary Lipton, AI researcher and the editor of Approximately Correct, pointed to OpenAI’s history of “using their blog and outsize attention to catapult immature work into the public view, and often playing up the human safety aspects of work that doesn’t yet have intellectual legs to stand on.”

Although OpenAI deserves all the criticism and heat it received in the wake of its misleading remarks, it is also right to be genuinely worried about the possible malicious uses of its technology, even if the company used an irresponsible way to educate the public about it.

In past years, we’ve seen how AI technologies made public without thought and reflection can be weaponized for malicious intents. One example was FakeApp, an AI application that can swap faces in videos. Soon after FakeApp was released, it was used to create fake pornographic videos that featured celebrities and politicians, causing concern over the threat of AI-powered forgery.

OpenAI’s decision shows that we need to pause and think about the possible ramifications of publicly releasing technology. And we need to have more active discussions about the risks of AI technologies.

“One organization pausing one particular project isn’t really going to change anything long term. But OpenAI gets a lot of attention for anything they do … and I think they should be applauded for turning a spotlight on this issue,” David Bau, a researcher at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), told Slate.

Originally published at www.pcmag.com.

--

--