Which Presidential Candidates are in the news?

Who among the hopefuls is making most noise and news?


Barack Obama took oath as the 44th President of the United States on January 20th, 2013. We are almost one and a half years into his second term, and the bells have already started ringing for the next electoral battle in 2016. No major candidate has declared his/her ambitions in the race already. Yet, the names are already doing the rounds, and prospective candidates are busy doing their vital homework. Fivethirtyeight has a nice story today about when Hillary Cinton might announce her candidacy.

It is important for the candidates to be in the news in these crucial early years, when the stories that they want to tell will slowly shape up. Who is faring well, and who needs to put in more effort?

Let us take a closer look at how much coverage the candidates are getting in the newspapers, in particular in the New York Times. The reasons for choosing the ‘New York Times’ is primarily that of convenience. The New York Times has a really nice article search API that comes like a blessing to data nerds like us! Of course, NYT will have its own bias, and it would be interesting to compare the coverage that it provides, with a different newspaper(Anyone knows another newspaper with an API?). But keeping that aside for the moment, let us concentrate on what the NYT has to say about “Who is in the news?”

I started off by writing a small python wrapper around the NYT API, looking to collect data on the number of articles published about each politician in a given month. I used R to plot the data for the candidates. Please scroll down to the “GeekSpeak” section to learn more about the data collection. You can see the code on my Github.

Who are the candidates?

That’s a tough question to answer, given that the field is wide open at this point. Almost everyone is certain that Hillary will run (but politics has a way of throwing surprises! So, keep your fingers crossed). Luckily, wikipedia maintains nice pages on the prospective candidates here and here. I chose 10 of them, simply from my political instincts (If this was 8 years ago, maybe I would have missed an insignificant Freshman senator from Illinois!) but I will go with that for now, and try to rate them in terms of how much coverage each of them got this year from NYT.

Democrats

An almost foregone conclusion, even before I plot anything, is Hillary Rodham Clinton is going to top the charts on this one. This, indeed turned out to be true, but not by the margin I had expected.

There are a few factors to consider however. On the speculated list there is Joe Biden, who gets a considerable amount of coverage simply for being the veep. Then there is Andrew Cuomo, who gets an unusually large amount of coverage, being involved in New York’s local politics, which NYT covers disproportionately more. But see how far behind everyone else is, after the first three candidates! The fourth and fifth spots are closely contested between two women, Kathleen Sebelius and Elizabeth Warren. This has been foreseen to be the year of the women candidates. At this point, it seems only Biden and Cuomo can measure up to Hillary’ firepower.

Republicans

This is where things get interesting. In contrast to the democrats the Republican race is wide open. It is hard to pinpoint even the top ten. I therefore resorted to surveying the top twenty candidates, based on the wiki page and a few other news reports(The RCP page is good, in particular). First,let us discuss those who made it to the top ten. Although the Republican pool looks crowded, however, very much like the democrats, there is a huge gap between the top candidates and the rest of them. Chris Christie’s coverage is largely negative though, because of Bridgegate. I suspect that has caused a sudden spike, but we will explore that in a later post. Also, being from New Jersey, NYT probably covers him more on the local section. Paul Ryan hasn’t had anything remarkable going on, so it is interesting to see him holding a steady lead. It is a little surprising to see Jeb Bush almost barely making it to the top ten, although he is considered one of the strongest contenders. If you look closely, the Republican list has a fatter tail, indicating that the ones at the bottom of the list have a greater relative importance in the race, compared to that of the democrats. So, the race is more wide open for the second-rung candidates.

How about the also rans? how are they faring? In the volatile Republican race, we cannot write them off in these early stages. We take a look below at how the bottom ten in the list of twenty stack up. The first surprise here is Rick Santorum, who came so close to the nomination last time, thrown out of the top 10 newsmakers list this year. The second, of course, is Sarah Palin, who seems to be out of the newswaves this year. But the most interesting thing to see here is how evenly all these candidates are covered, showing again, how the second rung Republican candidates are matched up in a close fight.

How have the top two candidates stacked up against each other over time? Is everyone here a consistent news maker, or have they spiked enough so that their ‘average coverage’ is not reflected properly? I will cover these aspects in the next post.


GeekSpeak

I first wrote a python wrapper around the NYT API. There are a number of them on the internet, but I just wrote parts of it I needed for this post. The code is on Github. The wrapper has a article_search_api class that derives from a base api class.

 def __init__(self):
self.baseurl=”http://api.nytimes.com/svc/search/v2/articlesearch.json?";
self.input_params={};#dict of input parameters.

I can then assign and reassign each additional input parameter I have to specify. For example, the begin_date to search for.

self.begin_date_str=”&begin_date=”+stryear+strmonth+strday;
self.input_params[‘begin_date_str’]=self.begin_date_str;
return;

Finally, the query returns a json file, that I write into a multilevel dict.

 self.return_articles= json.loads(urlopen(self.call_url).read());

One problem with the API is that it only returns 10 articles and you need to generate another query to get the next 10 articles and so on. This is particularly troublesome and also it is kind of painful to extract months from publication dates in the json file and assign them to appropriate data points. In contrast, I just use the metadata containing number of hits in a 1 month period, to find the number of articles. In a separate class, I process the json to get

def get_hitcount(self):
hitcount=self.result[“response”][“meta”][“hits”];
return hitcount;

That was easy! Finally, I obtain each data point by looping over the beginning and end dates.

for month in months:
time.sleep(0.5);#pause between calls, so not to throttle
article_search.set_begin_date(month,2,2014);
article_search.set_end_date(month+1,1,2014);
#call api
article_search.call_api();

The following R script lets me visualize the data (I can probably make this prettier.):

repub=read.table(‘rep_freq14.txt’,header=FALSE,col.names=c(“name”,”num_stories”))
barplot(repub$num_stories[1:10],
legend=repub$name[1:10],
names.arg=c(1:10),
xlab=”Candidate”,
ylab=”Number of News Stories”,
col=rainbow(10))


If you have any questions or suggestions, feel free to contact me.

P.S: If you loved this post, go ahead and hit the recommend button below. Also, you can follow me on twitter and github.