coveredBy and createdBy

using linked data for editorial curation

BBC News are annotating (tagging) all online content with linked data concepts — URIs that are explicitly linked (sameAs’d) to equivalent concepts in public datasets. In this post I want to describe two predicates that we are using to help journalists work with content: coveredBy and createdBy.

Why use linked data for journalism?

The motivation for this tagging work is to power dynamic aggregations of news content (topic pages) to help the audience discover related content, as well as exploding the long-form article format into a stream of chronological updates. Both types of collection can be curated to present a particular editorial viewpoint (a storyline). Over the past few months BBC News has been starting to expose some of these linked-data driven curations, for example in the May 2014 local elections or for the Scottish Referendum.

We store these tagging statements as content-about-topic RDF triples in a graph store. As well as content annotations we also use this pattern to relate topics to each other with semantically useful predicates:

<content> :about <http://dbpedia.org/resource/Tesco

<http://dbpedia.org/resource/Tesco> :industry <http://dbpedia.org/resource/Retail>

<http://dbpedia.org/resource/Tesco> :listing <http://dbpedia.org/resource/London_Stock_Exchange>

and so on (in practice we use BBC URIs that are sameAs’d to public identifiers, I use dbpedia here just for clarity).

But there’s another benefit of semantic annotation: on a typical day BBC News will produce hundreds of hours of TV and radio news, and thousands of online news updates. Semantic annotation of each content object allows us to collect together a set of related content and put it in front of a journalist for curation into the best audience experience.

coveredBy

Depending upon where in the BBC the journalist is working they are likely to have a range of concepts that are of interest. For example a journalist working on local news in the BBC North West region is likely to be interested in content tagged with places served by that region, along with notable people and organisations associated with the region (local government and MPs, local sports teams and players, etc). These concepts can be represented in linked data as a graph of concepts “covered by” that news service:

This graph can be used to query the graph store’s API to get content about concepts covered by that BBC news service. The RDF graph pattern is well suited to these sorts of loose couplings in an open-world domain; there are no uniqueness constraints (many teams can be interested in the same things) and the range (object) of the coveredBy predicate can be an editorial team like BBC Manchester or an individual journalist.

We can do the heavy lifting around generating some of these topic graphs programatically (for example by geospatially linking places to a BBC News regional, national or global service). But it’s important that a journalist has the ability to define their own graphs too — the topics and events that they want to know about; this is obviously an ever-changing set, particularly when it comes to news events.

To represent Events as data we use (a slightly bastardised version of) @moustaki’s Event ontology, linking the people, organisations and places involved in an event. This allows us to use the same trick to get relevant content in front of a journalist.

createdBy

It’s kind of an obvious thing to model the creator of a resource as metadata but until recently BBC News has tended not to do this. Internal content production systems usually have an idea who’s using them but publishing bylines is the exception rather than the norm, and even then if you want an aggregation of content by (for example) Emma Simpson you’d need to ask Google for it.

To remedy this we are planning to create URIs for each BBC journalist and editorial team, and (where available) linking them to equivalent public URIs. To associate a content object with a journalist or team we will use a createdBy predicate:

<http://www.bbc.co.uk/news/uk-politics-29626494> :createdBy <http://www.bbc.co.uk/things/3cfc0d75-565f-4d9d-9ae9-19e89f9cc57a>

<http://www.bbc.co.uk/things/3cfc0d75-565f-4d9d-9ae9-19e89f9cc57a> :sameAs <http://dbpedia.org/resource/Nick_Robinson>

As well as providing a useful way for the audience to find more content from that journalist or team, this predicate can be used in conjunction with ‘about-ness’ metadata to add some editorial scope to a set of content that has been tagged as about a concept. For example content about the local impact of the 2015 budget in the South West might be created by BBC Devon, while the Politics team would create content of a more general (or national) scope:

This is of course also useful for providing consistent editorial tone within a section of BBC News whilst still working from a single, shared vocabulary of tag concepts.

Implementation

Implementation of these two predicates and the associated instance data for journalists and teams is being worked out now, along with the corresponding API and UX work. I hope we’ll see content aggregated through these predicates in Spring 2015. If you’d like to know more check out the BBC’s ontologies website, or ping me a message on Twitter.

Show your support

Clapping shows how much you appreciated Jeremy Tarling’s story.