Using Twitter Data to Predict Partisanship of Newspapers

Published in

The Startup

11 min readOct 4, 2020

Written By John Graves and Kshitij Sachan

As America’s largest news sources become increasingly polarized, it’s important to be aware of their political biases in order to responsibly inform ourselves. Since I take a lot of journalism courses, I’m particularly interested in how we can identify biases in news media. Some people claim that Fox News is a far-right publication, and others say the New York Times is fake news.

All of this boils down to one question: How can we predict the political leanings of different news organizations?

In Spring 2020, John Graves, Kiran Merchant, Kshitij Sachan, and Jeffrey Zhu tried to answer this question for our final project in the data science course at Brown University. There’s obviously no easy answer to such a nuanced question. We began by considering the current approaches to label media biases.

Organizations such as AllSides publish their own ratings on media biases (shown below). These ratings are usually based on surveys and editorial insight. All Sides, for example, asks readers across the political spectrum to rate the bias in an article without being shown the source. They then ask their editorial staff of news experts to assess the ratings on a variety of different categories.

Figure 1: How AllSides categorizes the partisan bias of different news sources.

The advantage to this approach is that news experts are involved to verify that the result make sense. On the flip side, the process is neither objective nor scalable. This approach can be prone to bias because there are people involved, and there are more than 1,000 daily newspapers in the U.S., so there’s no way this approach could be used to determine the bias of every local paper.

Our project aims to model and predict the partisanship of newspapers using the political leanings of their Twitter followers, inspired by Pablo Barbera’s lovely research paper Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data.

By learning the relationship between a newspaper’s bias and the leaning of its readership, we can provide objective scores to large news sources, as well as scale to smaller sources and provide transparency in local newspapers.

Data Collection

We collected Twitter data on 180 news organizations, including national outlets such as Fox/NYTimes and local outlets such as the Providence Journal and the Brown Daily Herald

This is our full Twitter data collection Pipeline:

Collected 900,000 most recent followers of each paper. Filtered to remove companies/bots. This filtering included removing accounts if it followed less than 25 people, or if the name included one of the top 1,000 most common words, a number, special character, or was all capitalized. This filtering wasn’t perfect but analysis of the results showed very few false positives, and a lot of junk accounts removed.
Randomly selected 200 followers per paper and collected the 5,000 accounts they followed most recently. While this data was public, we made sure to only show the results in aggregate without usernames provided so as to honor privacy. We would have loved to get more data but were limited by the Twitter API. We could only get information on one follower per minute, and with 180 news organizations and 200 followers each this added up quickly.
Gave each follower a political score based on the politicians they followed. We had three methods of doing this, each using a different political scoring metric which are outlined below. The political distribution of a news source was then just the distribution of political scores for it’s 200 followers.

Scoring 1: 600 political accounts with scores formulated in Pablo Barbera’s paper mentioned above. These scores are continuous where a score of (-2) is very liberal and (2) very conservative, loosely. The dataset is from 2016, so scores are slightly dated (e.g. Trump has a score very close to 0)
Scoring 2: Every U.S politician tracked by ProPublica, with a score of (-1) if they are Democrat and (+1) if they are Republican
Scoring 3: The top 200 Twitter accounts liberals follow most and the top 200 accounts most followed by Republicans (also from Barbera’s paper). Many of these accounts are not politicians.

There are a few biases in the data to take note of. As Figures 2 and 3 show below, the political scores of twitter users in our sample skewed strongly to the left, and a large number of users only followed only 1 or 2 political accounts.

Figure 2: The Distribution of political leaning of all twitter followers in our database. You can see that the twitter sphere is skewed left.

Figure 3: The number political Accounts users follow based on the three different account metrics. A large number of people only follow one or two accounts.

Using scoring method 1 explained above, we estimated a score for each person that followed different papers. We then smoothed these scores to extract a distribution for each paper, shown below in Figure 4. The distributions themselves were quite revealing and seemed to do a decent job of capturing the political leaning of a paper. You can see that AlexJonesWins (the account for InfoWars) is far to the right, where as the Atlantic is on the left.

Figure 4: The distrubution of political scores for 10 different news sources. The order seems to pass the eye test.

We wanted to go further than just visualizing the distribution of the papers though. In order to predict the political leaning of a paper we needed to break down this distribution into a set of features that we could use to train a model.

Feature Selection

There were a couple ways that we broke down the political distribution into features. First, we had both the mean and variance of political scores for the followers of the news organization. We had to somehow account for the fact that different twitter accounts followed different number of politicians. This was important because we couldn’t be as confident in the political leaning of a user that didn’t follow as many accounts.

There were two ways we tried to account for this:

We tried a process called “regressing to the mean” by doing the equivalent of adding a constant number (in our case 3) of politically neutral accounts to a twitter user’s following list. This works to move users who don’t follow many politicians toward neutral. For example if a twitter user followed only one politician, then ¾ of their followings would be neutral making the user more neutral, but if a user followed a lot of politicians, adding three neutral accounts wouldn’t make much of a difference and the users political score would be determined primarily by the accounts they follow.
We removed users who we decided didn’t follow enough political accounts (in our case less than 3). This was based on the hypothesis that if you only followed a couple accounts then we shouldn’t even use that data at all. This has the advantage of not overpowering the scores of politically active twitter users with a bunch of neutral scores from inactive users.

Aside: We considered using parameter sweeping with our decision tree on both the number on neutral accounts to add, and the minimum number of politicians necessary, but decided not to in order to avoid overfitting the model to our data during the sweep. We could have avoided this problem by creating a validation subset of our data, but with only 55 data points (newspapers that had been scored by AllSides) to begin with we didn’t want to split the data up any further and worried there there would not be enough data in the validation set to be statistically significant.

Doing these modifications to the data left us with eight features for each of the three scoring mechanisms of political accounts (Regressed refers to 3 neutral accounts were added and Politically Active refers to eliminating users that followed less than 3 political accounts):

Mean
Variance
Regressed Mean
Regressed Variance
Politically Active Mean
Politically Active Variance
Regressed Politically Active Mean
Regressed Politically Active Variance

Once we created our features we had to decide what machine learning technique to use to categorize the data. Since the AllSides data we were using was grouped into five distinct groups, we decided to use a decision tree.

Decision Tree

Decision trees are one of the most widely used supervised classification methods for machine learning. When the model trains it is able to sort through a lot of variables to find the ones that correlate best with the desired results, and can be modified with a maximum depth and a train-test split to prevent overfitting. The method works better than linear regression for classification, since it already creates the threshold for each classification. In addition, decision trees are one of the more clearly explainable classification methods.

When creating our decision tree we wanted to select a model that was both accurate and accounted for overfitting to the data. We measure success or failure by looking at the percent accuracy of our model in the test data set of our train test split. To implement this test we bootstrapped the results of the decision tree to run it 10,000 times with ~2% of the data as the testing set. This limited variation and made us more confident in our results. We also limited the tree’s maximum depth at three. Keeping the depth reasonably small reduces overfitting and three tiers was the smallest number that seemed to perform fairly well.

How did we test our model?

There were a couple different challenges we had with evaluating our model. First, we only had 55 scored papers from All Sides, so getting even one more paper correct would be a significant improvement in the model. Secondly, we cared not only about whether the model missed on a paper, but also how far off it was. To account for this we not only looked at the model’s success rate predicting all 5 of the AllSides categories, but also the success rate when decreasing the prediction to one of three categories (Left, Center, and Right).

Using the features we had calculated earlier, we tested our decision tree using the three scoring mechanisms (described in the data collection section) both individually and combined. The results were clear: the first scoring mechanism (using Barbera’s scored twitter accounts) created the best model. It outperformed all other models by at least 5% with both three and five categories of partisanship.

Figure 5: The accuracies of our decision tree using different sets of features. Scoring method 1 outperforms all the other models fairly significatly.

We had one more idea about how to find features from the distributions though. We noticed that a lot of the distributions had multiple peaks, and we thought that these peaks could be relevant in classifying the news organizations.

The intuition behind this idea is political leaning tends to be a bimodal distribution: most people are either liberal or conservative, with few people in the middle. If we could identify the size and ideological center of these two peaks for each paper, maybe that would be a more useful predictor of the paper’s political leaning than our previous features.

We used k-means clustering to break down the data into two clusters. Along with mean and variance of each clusters we included the mean and variance of the entire distribution, giving us a total of six features for each distribution. An example of this breakdown is shown below.

Figure 6: Example of using cluster features on the New York Times.

Unfortunately, these features performed worse than any of the other decision trees we had tried thus far. We were happy with the results of our solution using the first scoring mechanism though, so we stuck with these set of features.

Analyzing the Results

Figure 7: Our final decision tree model using the features from scoring method 1.

Ultimately ,we were very happy with the decision tree using the first method of scoring. When training it on all the data (testing and training sets combined) it correctly classified 92.7% and 83.6% of papers correctly with three and five categories. Of course we need to account for overfitting, so this number drops to 86.7% and 67.6% when running it on the training set compared to baseline success rates of 40% and 23.6% respectively.

Diving further into the data, we can look at the confusion matrix to identify where the model miscategorized a paper. One good sign is that it never thought a right paper was left or that a left paper was right. This shows that our model is at least getting in the right ballpark (this is also seen by the high success rate when grouping papers into three categories).

Figure 8: Confusion Matrix for our decision tree.

Extending our model to local newspapers

One of our goals of this analysis was to create a model that can be used to predict the political leaning of local newspapers. One drawback of our labeled data is that AllSides primarily labels national news sources, so we had very little local data to train on.

When we ran our model on local papers our results were pretty skewed. Only 4 out of 121 papers where labeled lean right or right. When we think more about what is happening though, this isn’t very surprising. First, all our local papers are from New England. New England tends to be more liberal and therefore have more liberal followers, so if you were to just take a random subset of New England twitter users they’d be more liberal than the average national twitter user.

This starts to get more at what our model is really analyzing. It is looking at the political breakdown of the audience of different news sources. For national coverage, it follows that people gravitate toward sources that tend to support their political leaning, and hence there is a strong correlation between user base, and political leaning. As you get into more local areas, it isn’t surprising that this breaks down as there are fewer news sources to pick from, and so it will seem like most sources regress toward the political leaning of the area.

This difficulty could potentially be addressed by adjusting the model based on the average political leaning of a user in a certain region, but this would be another whole project. For now, this analysis still has two advantages in local media markets:

1. It still does a good job of defining the political preferences of the people that use a source. As technology encourages people to surround themselves in partisan bubbles, this analysis can help people become more aware of who is accessing what news.

2. This analysis highlights the importance of local news. Local news is almost nonpartisan. Unlike the competitive national news market, most areas have only a few local newspapers, so they attract people across the political spectrum. Of course, a local newspaper isn’t trying to attract people from both Idaho and Massachusetts the way a national source might, but they do have the role of bringing information to their entire community.

Conclusion

Ou model is very successful at predicting the political leaning of national news sources, which is an important task given the polarized atmosphere we are currently in. This is especially relevant because there has been an increased focus on political bias within journalism in the past few years. While the model doesn’t translate perfectly to the local level, it still does a good job visualizing the political leaning of the audience of different sources. Once accounting for the political leaning of the region, the model has promise to be able to go to the next step of judging the leaning of the paper.

While the results are insightful, it is also important to recognize what these results aren’t showing. The model doesn’t consider journalistic integrity, which is an important variable. A news source can lean toward a political party while holding itself to high journalistic standards. Because of this, it is completely incorrect to substitute the leaning of a paper with a measure of accuracy or truth.

Overall though, we thought our model was very successful with what it set out to do, which was judge the political leaning of news sources based on their twitter followers.