Generating Today’s News based with GPT-2 Trained On Past News Articles

Isabelle Lee
8 min readJun 11, 2020

The political bias in the news has become an issue garnering ravenous bipartisan interest, although the current state of affairs and the rhetoric around them would have us believe that journalism itself may seek to divide. It’s one of the few things we might be able to agree on: it’s important to get facts straight and we all think we’re right. Welp, at least I do.

So I’ve been curious. Who in fact has the “best” information out there and how is that categorized? Also, do liberals and conservatives talk about a totally different things? Is there a difference in how they talk? Can an AI learn and mimic how Fox News talks vs. New York Times talks? Like, say, generate an article?

My first question seems to be quick enough to chart.

There was a political bias and overall quality chart by Ad Fontes media, which positions the publications in boxes of green, yellow, orange, and red. To be completely honest, I’m uncertain how this chart was created and how they ensured that their judgement was not inserted into the creation of this chart. Nonetheless, this was a good way to segment out the dataset I found on Kaggle.

Figure 1. Political Bias in News Media vs. Overall Quality of Reporting. (Ad Fontes: https://www.adfontesmedia.com/interactive-media-bias-chart/?v=402f03a963ba)
Quick breakdown of the number of articles by names of publications.

I was able to find a data set of news articles on Kaggle competitions. The list of publications included in this dataset and the numbers of articles are listed here. From these names of publication, I selected New York Times, NPR, and Washington Post for the “green” dataset, as falling in the green box on bias-quality chart. Similarly, “red” dataset was made from Fox News, Breitbart, and New York Post. Ideally, I would have made a separate corpus for “Most extreme left” portion of the red box, but they were not as easy to find than the listed corpus here.

Now a tougher question: Does bias inform what topics we talk about? and Do they talk in a different way?

Something quick I can think of was sentiment analysis, so I ran quick sentiment analysis on a few of these publications on different boxes, and found that the sentiment was actually not measurably different: a few samples from New York Times and Breitbart actually were both “neutral” sounding according to Vader sentiment analyzer from NLTK.

But, there were some differences in what they chose to focus on. I trained gensim’s LDA for topic modeling on all publications (and these are available on my github) to see 20 grouped topics and associated words in each of them. The full output is available here. Quickly skimming through these topics, it seemed that the Fox News corpus was a lot more focused on US politics focused. About 4 of their topics focused on foreign policy, with all of them mentioning adversarial terms such as “nuclear” or “terrorism” (topics 3, 9, 12, and 19). Given that it was an election cycle, there were unsurprisingly many topics focused on Clinton and Trump. Similarly, Washington Post also had many topics focused on the US election, but you can also see that there were more variety in the topics like cooking (topic 3), medical science (topic 10), climate science (topic 13), and football (topic 7). However, we should keep in mind that this could be due to how the original Kaggle dataset was created, and which article was included in the corpus.

I’m sure there are more ways to compare these two corpora, but for now, we’re moving on to the coolest part! Let’s teach an AI how they speak and generate a couple articles!

Training GPT-2 and Generating Articles

For generating a few articles, I’ve taken a pre-trained GPT-2 model (with 345M parameters) and fine-tuned them with the Red corpus and the Green corpus on Google colab. Then, I’ve fed in a first few sentences as input for my conditional GPT-2 model to generate the rest of the article.

Below is an output from my Red corpus trained model. The input sentences (bold and italicized) were the opening sentences from this current Fox news article on Joe Biden’s VP pick.

Arguably the most important decision apparent Democratic presidential nominee Joe Biden will make before Election Day is his choice of vice president.He’s already said that he intends to nominate a woman and will name a special panel to advise him on the choice by May 1.But signs suggest he’s ignoring the most important consideration.The names that keep coming up include three who ran for president this year – Sens. Kamala Harris of California, Amy Klobuchar of Minnesota and Elizabeth Warren of Massachusetts, as well as Michigan Gov. Gretchen Whitmer.Other names include former Georgia gubernatorial candidate Stacey Abrams, Florida Congresswoman Val Demings and Nevada Sen. Catherine Cortez Masto.In a recent interview with Fox News,   Biden hinted as much in response to Trump’s suggestion that he change the      in favor of women.“I could say one more time I want to say yes,” Biden said as he added that he would “never stop” talking about women, especially as Trump’s attacks on women have escalated even in the past.“But it does feel like the way we have to play our election, especially to women, the way we have to put our priorities in front of the campaign, that’s what I try to talk about in the campaign.” But Biden’s name, while not the only possibility, might be far from the worst option. “What I would do is look at the field and figure out which, actually, has been the most impressive part of your campaign and the least impressive part of your campaign has been the things you have done,” Biden told Fox News.“And then let me make the hard choice and make the selection of the best person,” he said.“It’s always going to be one of the most important choices of my life and I will do everything I can to make it a success.” Biden also told Fox News this week that he was “a little confused” about a    debate in which Sanders challenged Clinton on a series of   attacks about Wall Street and her emails during the primary.“I think what we have now is a   moment,” Biden told Fox News.Biden has had one of his greatest strengths as a politician in the past.It was said that he was always the one who would get the most out of the political debates because he was always the one who could get the most out of them and never backed down from any issue.However, while some of the more    Clinton has been successful at making the point that she’s the one to blame for      scandals, it would be Sanders who actually hurt Clinton’s prospects for election, much like the Iraq War controversy did to Kerry last year.Sanders has been in the spotlight all year for his     record, but none have played so strong an   in the news cycle for the Democratic contest, at least in the polls.

Alright, there’s some gibberish in there, and the topic of VP pick seems to be drifting to a conversation about a democratic primary (understandable, since our GPT-2 was trained on last election season). But the first half of this article talks about Biden’s position on women and his commitment to select the “best person” seems somewhat relevant. Definitely divergent from the rest of that Fox article, and equally perplexed by who Biden might pick as the rest of us.

For the Green corpus trained GPT, I wanted to give less input to see what it could generate. A fragment of the opening sentence from this Washington Post article was fed to the model and below is generated.

Does President Trump want to expose his strongest supporters to a deadly disease? Does he think Oklahoma is in play in 2020? And, finally, did he say that he thought the judge who blocked the Trump executive order was doing a good job. All of these questions have to be answered. And in the meantime, it would be prudent to consider these questions:       This is the worst executive order on record. But at least Trump was careful to state explicitly what he would be doing to put a stop to it. This is not all that the president intended, or could have meant! Trump and his associates say they think this is a "working analysis." That's how I interpreted his comments to NBC News. I also got his comments on CNN: He said he hoped they would be taken     that he understood    that he would take action based in the executive order and that ultimately    ”we’RE in charge now.” But even this statement has some issues. In his speech announcing the order, he said, ”I’ll leave it to my lawyers.”   . This is an odd phrase     but the    is also a tricky one to interpret. Trump also said it will be ”temporarily lifted.” This could mean temporarily   the temporary. The real estate mogul said this was a temporary order. That may seem like a good idea for his supporters and it is a great deal for the national economy. But it does not seem a wise move for Trump. It sounds almost like he and chief White House adviser Steve Bannon are calling the shots on the issue. It makes the news! Trump is so eager to bring back loyalty and his brand by putting a stop to any illegal immigration that he may be in danger of undermining his own campaign promises to keep the country together if he becomes the next president of the United States. This is just the latest example of how his comments about illegal immigration will make his administration harder to resist. This is Trump’s new reality. And it is his own fault       This is Trump’s dilemma         The president is on the brink of breaking his promise he will be the ”greatest jobs president that God ever created.” How will he stop illegal immigration?

As the previous model’s generation, this model also has quite a lot of gibberish inserted into it, but the grammatical formulation was glaringly worse. For example, the sentence “it would be prudent to consider these questions:” wasn’t followed by questions. There are many more run-on sentences and incomplete sentences as well. The content, though, seems more uniform. As a whole, it focuses on Trump’s policies and an imaginary WaPo editor’s (largely critical) opinions and commentary on them. In retrospect, I probably should have left the green model train longer, as the corpus was significantly larger with longer sentences.

I was hoping that we could try GPT-3, but alas, the model was a bit too big and there was no pre-trained models available yet! Maybe next time..

--

--

Isabelle Lee

Working in NLP, part time grad student, ex-physicist, bad writer of essays and short stories, learning how to box