Introducing “Tonar”

An Incivility Index for Election 2016

Hilary Clinton’s “disgusting” bathroom break. The size of Donald Trump’s hands and other appendages. An unsubtle allusion to Megyn Kelly’s menstrual cycle. Little Marco. Lyin’ Ted. Getting “schlonged.”

You don’t have to be an MIT scientist to know that the Presidential campaign has often been lewd, crude, and rude. But it helps. Here at the Laboratory for Social Machines, part of the MIT Media Lab, we’ve developed an “Incivility Index” that measures not what the candidates say about one another, but the tone of the conversation about the campaign.

The headline: April may be the cruelest month, but March was the crudest — at least so far.

The graph below shows the share of election-related conversation on Twitter that we have identified as uncivil since the beginning of March 2015. The share — or percentage of all election-related tweets — is the number on the Y-axis with the decimal point moved two places to the right.

You can see that as the campaigns headed for New York last week, the place where the rest of America thinks civility goes to die, the level of incivility steadily increased.

On many days, the share of uncivil tweets spikes above 10 per cent of all election-related tweets, and on some days it flirts with 20, meaning that nearly one out of every five election-related tweets fails the civility test.

Share of election-related tweets that are uncivil

The Incivility Index, or “Tonar” as my colleague Soroush Vosoughi dubbed it, is the newest feature of the Electome, a tool we’ve built to chronicle the “horse race of ideas” and other features of the campaign by tracking social and mainstream media. With support from the Knight Foundation and access to all of Twitter’s output, we have been fishing out election-related tweets for more than a year now and using machine-driven semantic analysis to analyze their content by issue and by candidate. (For more about the Electome and the horse race of ideas, please see the links and FAQs at the end of this post.)

To create Tonar, we trained our analytic engine to recognize vulgarisms; schoolyard insults; violent expressions; ethnic and sexual slurs; and enough profanity to make an algorithm blush. Our data scientists then classified the uncivil tweets in four categories. Here’s how they break down for March and the first part of April, through the New York Primary.

Uncivil conversation in March by category
Uncivil conversation in April by category

Naturally we were curious whether some election-related conversations are less civil than others.

Take profanity, which has the dubious honor of dominating the incivility landscape. The percentage of election-related tweets containing profanity is on the rise, hovering at around 5 per cent of late. Here’s the pattern since last November.

Share of election-related conversation containing profanity since Nov 2015

That huge spike was on Saturday, March 12th — the least civil day of the campaign so far. This was the day after the Trump campaign canceled a planned rally in Chicago amidst clashes between protestors and supporters.

Tonar can analyze conversations around specific issues and candidates. When you apply the profanity filter to some of the prominent issues that the Electome has been tracking, such as Immigration, Foreign Policy/National Security, Guns, or Racial Issues, Tonar shows incivility rising across the board as the campaign heated up this year. And while the conversation about all these issues is getting saltier, comments about race have the highest percentage of profanity, some days spiking to more than a third of all race-related tweets.

The Incivility Index shows an increase in profanity-laced conversations about most of the candidates over the past months as well. It’s important to note that these findings don’t reflect “stance” — the opinions expressed in the conversation. The algorithms can’t yet detect with reasonable certainty what position a particular tweet about Trump or Clinton that uses profanity is taking — just that it uses rude language to do it.

Perhaps not surprisingly, the conversation around Trump is somewhat more profane than the conversation around Ted Cruz. The charts below show the pattern since November. The Trump spike on March 12th corresponds to the overall spike in profanity we identified earlier. And this may be one arena where John Kasich is happy to come in third most days — a social mirror to the more moderate tone of his campaign.

Share of conversations about the current Republican candidates that involve profanity since Nov 2015

Meanwhile, as the tone of the Democratic campaign has become harsher, you can see the conversations about Clinton and Bernie Sanders deteriorating as well — giving a new meaning to the phrase “blue states.”

The profanity spike in Clinton-related conversations on February 19th corresponds to a Las Vegas town hall in which she and Sanders sparred about immigration reform. The cuss words in Sanders-related conversations peaked on February 29th and March 1st, apparently fueled by the excitement around Super Tuesday.

Share of conversations about the current Democratic candidates that involve profanity since Nov 2015

If there’s anything to celebrate when peering at the data through this particular lens, perhaps it’s that most of the conversation about the campaign is profanity-free — and in that sense at least, civil.

Five more states hold primaries this week. And as the campaign swings into the final stages of primary season, we and our media partners will look not just at the Incivility Index for conversations about specific issues and candidates, but also for each candidate’s supporters.

Tonar will tell us whether the election conversation gets more “presidential” — or even uglier.

aheyward@media.mit.edu/@andrewheyward

NOTE: Soroush Vosoughi and Prashanth Vijayaraghavan, researchers at the Laboratory for Social Machines, developed the analytics and visualizations for this post.

More about the Electome on Medium

Enter the Electome,” Andrew Heyward, 12/28/15

Time to check your candidate’s chat-scan,” Andrew Heyward, 2/1/16

Who’s Influencing Election 2016?” William Powers, 2/23/16

FAQs

How does The Electome identify tweets that refer to the election?

Twitter has given the Laboratory for Social Machines access to its entire database, which is growing by an estimated 500 million tweets per day. We have been tracking election-related tweets since February of 2015. A computer program uses language analysis to identify the tweets that refer to the American election — approximately 250,000 a day at this point. Then the algorithm classifies each tweet by issue and/or candidate. Data analysts check random selections of tweets to confirm their relevance, and the computer program uses their assessments to keep improving its performance over time.

Do you count only original tweets, or are re-tweets part of the mix?

The data includes re-tweets.

Is the program counting only election-related tweets originating in the United States?

Only 1% of tweets are geo-tagged by Twitter, so in order to capture tweets from the United States, our program filters for tweets coming from U.S. time zones in English. That does mean that English-language tweets from Canada relevant to the U.S. election are included.

What about tweets in other languages, such as Spanish?

For now, we are only analyzing tweets in English.

How does The Electome determine “share” of conversation on Twitter?

The share of conversation is the number of tweets about a specific topic or candidate divided by the total number of election-related tweets within a given time period. So, for example, if there were 100,000 election-related tweets in a given time span and 20,000 were about Donald Trump, his “share” would be 0.2 out of 1.0, or 20%.

Does The Electome “know” what position a given tweet takes on an issue?

No. We’re not yet capable of measuring stance, namely where the tweeter stands on an issue — just engagement with it.

How does the population using Twitter compare to the U.S. population overall?

The conversation among Twitter users is not representative of the public at large. Twenty percent of Americans use the platform, and their demographic makeup and levels of political interest differ from the public overall. The analysis should be viewed as a readout on the views of the platform’s users.