The Many Related Concepts of a “Donald Trump”

Rev Dan Catt
Kaleida
Published in
6 min readDec 12, 2016

Part of what we do at Kaleida is dig around a whole bunch of news articles from a whole bunch of sources (see previous posts for more information) and try to make sense of which ones align with others and which ones are out on their own. When enough similar articles bundle together they become a topic. Topics in turn can mutate and slowly shift focus, until they reach a breaking point and split off into new topics (insert a montage of bacteria cells under a microscope dividing and multiplying). In turn these slowly (and sometimes quickly) changing topics form the backbone of a narrative or in news terms an “ongoing developing story”.

One of the core building blocks of a topic is the “concept”. It’s a slightly unwieldy term, an accurate but utilitarian one used by software engineers when trying to get computers to reduce a piece of written prose into something simple enough for algorithms to crunch on. In short it’ll take a painstakingly written and well considered 1,600 word article about climate change and go “Oh, that’s clearly about; trees, ice caps, London, housing market, pollution, CO2, farming subsidies and badgers”.

Find another article also about trees, pollution, crops, farming subsidies and badgers? Well then you’ve got the start of a “cluster”, another utilitarian engineering term. Two articles with completely different concepts? Then I guess they’re unrelated.

You say tomato I say tomato

The other useful part about concepts (and related subject areas) is it gives us a common language (in English and all the bias that goes with it) to connect articles from many different sources. For example The Guardian doesn’t have a general Motoring section, but still covers news about cars, by letting the computer sort the information into concepts and sections we can normalise between publications.

“United Kingdom European Union membership referendum, 2016” a.k.a. Brexit.

An odd quirk of training your algorithms on specific corpuses of source text so as to crunch down wordy languages into snappy concepts is the disambiguation that takes place, converting slang, abbreviations & nicknames into the more correct and technical terms. You start to spot the odd ‘tell’ that gives away the sources of an algorithm’s training.

Wikipedia, for example will attempt to nudge you away from the Brexit portmanteau towards the stunningly dull “United Kingdom European Union membership referendum, 2016”, and due to how we work out our concepts on Kaleida we use the same term, Wikipedia is a great source of text on which to train your networks ;)

Which is why it made me smile when our CTO Graham Tackley spotted this on Google’s news trends…

I mean, it just rolls off the tongue, “oh yes, people are talking about David Lidington‬, ‪Emily Thornberry‬, ‪Theresa May‬, and the ‪Proposed referendum on United Kingdom membership of the European Union”. A precision which is perfect for computers, but a bit jarring when presented back to humans.

Concepts connected to Trump

However the computer bit does allow us to do some fun things, such as tracking topics over time. In this instance I can look at the concept of a “Donald_Trump”, and then see which concepts are most closely related, and how those change over time.

Here are the top 10 concepts connected with Trump from 4 weeks ago (11th-17th November 2016), and the week just gone (2nd-8th December). The size of the circles show the relative “strength” of the concept.

While the overall “strength” of related concepts has gone down, the concepts themselves are pretty much the same, New York Magazine has been switched out for China. While those appear to be a constant background story that goes along with Trump, we can pick out short term trends by taking out the concepts that are common to each of the last 4 weeks.

This gives us an over view of “local” concepts related to Trump. Thus allowing us to follow long and short term narratives in the news which we can view based on any concept (or combination) we wish to select.

For reference these are the concepts that appeared connected to Trump over all 4 weeks which we removed, leaving just the concepts above; Republican Party (United States), Democratic Party (United States), United States Senate, Barack Obama, George W. Bush, Hillary Clinton, Mitt Romney, Mexico, Washington, D.C., Presidency of Barack Obama, United Kingdom European Union membership referendum 2016, Mike Pence, United States, Pennsylvania, Florida, The New York Times, Michigan, Wisconsin, Manhattan, Supreme Court of the United States, White House, Fox News Channel, Trump International Hotel and Tower (Chicago), Japan, Facebook, Twitter, Vladimir Putin, California, Texas, London, CNN, Syria, New York (magazine), The Washington Post, Federal government of the United States, New York City, European Union, Muslim, Presidency of George W. Bush, Patient Protection and Affordable Care Act.

Plotting the related concepts over time can help us see the shape of an evolving story. In this case we keep an eye out for new concepts appearing that weren’t there for a previous amount of time. In the graph below we can see Cuba and Taiwan appearing and would each be part of their own topic in someway connected to the over arching Trump narrative.

A concept like National Security which pops back into the picture twice would have an earlier topic and a later topic. The earlier one would be Donald Trump + National Security + one bunch of stuff, while the later one has probably come back into the picture because of a different bunch of stuff. “Stuff”!

Quick note: When generating this chart I was using a dataset up to midday on the 6th, which is why you get that smooth drop off at the end of each line, with a cute little rounding error kick at the end.

At some point we’ll look back at a whole years worth of data and be able to pull out the various themes over time. The fun part comes when we feed all the data back into a learning network, to see if we can start spotting patterns in the news at it comes in.

Anyway attempts to see into the future aside, Kaleida is quickly becoming a pretty darn good engine (even if I do say so myself) with various knobs and levers we’re fine tuning that’s giving us a great way to chop & cut up the news giving us both the big picture and small details. From when the momentum of an ongoing story starts to evolve, to spotting quick shifts from one topic to another and whole new BREAKING NEWS stories appearing!

And, it has pretty colours.

--

--

Rev Dan Catt
Kaleida

Ex-Flickr, Ex-Guardian, now playing at the intersection between data, code, journalism and art.