Enter the Electome
What’s the connection between a campaign issue and a vegetable? ProPublica President Richard Tofel reminded us this month. “For too many voters, as every editor in her or his darkest heart knows, reading about most ‘issues’ feels too much like eating their broccoli.”
However, at this early moment in the 2016 presidential campaign, the broccoli rule has been suspended. Consumers are gobbling up coverage of how the candidates stand on national security and immigration as fast as editors can cook it up.
For that we can “thank” national jitters in the aftermath of Paris and San Bernardino and the media-fueled candidacy of Donald Trump.
But the media’s relentless focus on national security and immigration, while shedding light on some important policy questions, may be leaving voters in the dark about other issues that our research shows are important to significant numbers of people.
Here at the Laboratory for Social Machines, part of the MIT Media Lab, the team has built a new analytics tool called the Electome, using big data to track the the issues that matter most (and least) in the presidential campaign.
The Electome has access to the entire Twitter archive and so-called “fire hose.” Its algorithms can dive into Twitter’s massive stream — an estimated 500 million tweets per day — and fish out tweets specifically about the election.
The Electome goes much deeper than counting hashtags. Using semantic data analysis at a massive scale, it classifies the election-related tweets by topic — 21 different issues for now — and by candidate. The result: a unique picture of the issues that millions of engaged potential voters care about. [NOTE: The Electome captures U.S. users who tweet in English. An estimated 19% of U.S. adults and 23% of Internet users were on Twitter as of September 2014, according to the Pew Research Center.]
The team is also tracking political news from a dozen news organizations, as listed below. That number will triple in 2016.
With support from the Knight Foundation, the Electome team is working with media partners — The Washington Post and others — but will also publish its own findings.
To change metaphors on the run, we are covering the “horse race of ideas” to supplement and perhaps enrich news coverage of the usual horse race — polls, primaries, pundits, predictions, you know the one. Not surprisingly, since the campaign got unofficially underway early this year, the media has spent much of its time and space on the familiar horse race, as the chart below from the Electome shows. The label “Politics” includes all aspects of the campaign other than the issues themselves, from who won the latest debate to who’s ahead in the the latest polls to who just called who a “jerk.”
While the media coverage (encouragingly) has focused more on “Issues” than “Politics” to date, you can see the ratio changing as the race heats up. Once you take “Politics” out of the equation, here’s how the coverage and conversation about campaign issues have waxed and waned in the media and on Twitter in the same period.
You can see in both charts how recent news events have shifted attention to Guns, Immigration, and Foreign Policy/National Security, which includes tweets about terrorism.
But there are significant disparities between the media coverage and the Twitter conversation as well. In the charts below, the red line represents the share of media coverage, the blue line the share of conversation on Twitter. As the Electome data shows, issues like campaign finance and financial regulation command a notably larger share of media coverage than of conversation on Twitter.
Conversely, people on Twitter are noticeably more interested in racial issues, jobs, and budget and taxation issues than our media sources appear to be.
In order to tie the horse race of ideas to potential election outcomes — the conventional horse race that matters to everyone — the Electome also measures media coverage and Twitter conversation about the candidates themselves. Here’s a look at the candidates’ share of media coverage and Twitter conversation since February.
Again, the Electome charts disparities between coverage and conversation. Two examples: while Ted Cruz’s share of Twitter talk compares fairly closely to his share of media coverage — the big spike is the March 23rd announcement of his candidacy — Marco Rubio, who announced three weeks later, “underperforms” somewhat on Twitter compared to his share of press.
Trump, by contrast, is even more dominant on Twitter than in the media as the weeks go on. No wonder he’s driving the other Republicans nuts.
Over the course of the campaign, we believe the Electome will help us understand how news events, media coverage, and the national conversation about issues (at least on Twitter) influence one another — and potentially the election. Please pass the broccoli.
NOTE: Soroush Vosoughi and Prashanth Vijayaraghavan, researchers at the Laboratory for Social Machines, developed the analytics and visualizations for this post.
List of Electome media sources as of 12/28/15
Los Angeles Times
New York Times
Wall Street Journal
How does The Electome identify tweets that refer to the election?
Twitter has given the Laboratory for Social Machines access to its entire database, which is growing by an estimated 500 million tweets per day. We have been tracking election-related tweets since February of 2015. A computer program uses language analysis to identify the tweets that refer to the American election — approximately 250,000 a day at this point. Then the algorithm classifies each tweet by issue and/or candidate. Data analysts check random selections of tweets to confirm their relevance, and the computer program uses their assessments to keep improving its performance over time.
Do you count only original tweets, or are re-tweets part of the mix?
The data includes re-tweets.
Is the program counting only election-related tweets originating in the United States?
Only 1% of tweets are geo-tagged by Twitter, so in order to capture tweets from the United States, our program filters for tweets coming from U.S. time zones in English. That does mean that English-language tweets from Canada relevant to the U.S. election are included.
What about tweets in other languages, such as Spanish?
For now, we are only analyzing tweets in English.
How does The Electome determine “share” of conversation on Twitter?
The share of conversation is the number of tweets about a specific topic or candidate divided by the total number of election-related tweets within a given time period. So, for example, if there were 100,000 election-related tweets in a given time span and 20,000 referenced Donald Trump, his “share” would be 0.2 out of 1.0, or 20%.
How does the population using Twitter compare to the U.S. population overall?
The conversation among Twitter users is not representative of the public at large. Twenty percent of Americans use the platform, and their demographic makeup and levels of political interest differ from the public overall. The analysis should be viewed as a readout on the views of the platform’s users.
What about share of media coverage — how is that determined?
The Electome monitors the websites of a collection of influential media sources, currently 14 in number. That number will increase as we build out the system. The computer program identifies the articles that are about the election — roughly 200 per day out of 2,000. Then it classifies each article according to candidates and topics.
You used the plural there: how does The Electome count stories that refer to more than one topic, or more than one candidate?
The “share” is divided equally among the topics or candidates. If an article mentions five candidates, each of them gets credit for a ⅕ or 20% share. By the way, the same goes for tweets that mention more than one candidate. (Generally, tweets are too short to mention more than one topic.)
Does The Electome have a way to account for headlines, photos, videos, word count, prominence of placement, and other factors that might give some stories a greater “presence” than others?
Not yet — the team is working on ways to account for most of those factors. Videos and photos are a separate data challenge.