Storylines as data in BBC News

notes on implementing the news storyline ontology


The BBC News website is the most-used news website in the UK, typically attracting over 5 million unique browsers daily. In the fifteen years since it’s inception it has grown from humble beginnings to the comprehensive service that it is today, reflecting the breadth of BBC News output from local to national and global news and the World Service. On a typical day BBC News online will produce around 500 web pages: primarily text articles with images but increasingly publishing audio, video and graphics to tell the story, sharing content from broadcast television and radio news.

But while the front-end design has changed over the past fifteen years the underlying content model hasn’t. The BBC News website behaves rather like a newspaper, publishing articles about stories and (in case the reader missed yesterday’s edition) telling the story anew every time. This results in tens or even hundreds of BBC News web pages that all tell part of the same story up to the moment in time when they were published.

This isn’t such a problem if we assume that the BBC audience all follow a nice linear journey that begins with the News home page (or front page as it’s known internally…) or a section index and follows a carefully selected and up-to-date set of links. But more and more of the audience are reaching BBC News through links shared via social media, search, email and other platforms rather than coming in through the front door.

Search engines do come in through this metaphorical front door, and will tend to favour recency of publication date in presenting results to a query. But they also hold every previous BBC article in their own indexes — for example here’s Google’s index of BBC news content about the crisis in Ukraine — thousands of fragmented versions of the story so far.

Linking up BBC News with metadata

I’ve written on the BBC’s Internet Blog about how BBC News are starting to introduce metadata, annotating content with identifiers for real-world topics (people, organisations, places and themes):

<contentURI> <mentions> <Edward Snowden>
<contentURI> <mentions> <the NSA>
<contentURI> <mentions> <Russia>

These annotations can be used to automate aggregations of related content, for example asking the back-end data store for the most recent BBC content annotated with Manchester United. BBC News online currently maintains hundreds of topic indexes like this, all curated manually. Using this automated annotation-driven approach can remove this burden and allow for many more such aggregations to be created.

Another useful aspect of these metadata annotations is that they make a clear statement about the relationship between the BBC’s content and the real-world topic: this is one of the advantages of using the RDF model over simple key-value tagging. For example compare these three statements:

1. <contentURI> <isTaggedWith> <Russia>
2. <contentURI> <isAbout> <Russia>
3. <contentURI> <mentions> <Russia>

In the case of the example article above statement 2 would be inappropriate: Russia is mentioned certainly but it is not a web page about Russia. Statement 1 could be used for a simple tag-based aggregation, but statement 3 gives the audience a more accurate indication of the relationship between the BBC article and the topic of Russia. In a topic-driven aggregation the semantic annotations might look like this:

Storylines as metadata

But these annotations can be improved: if the example article above is about anything it is a development in the storyline of the US spy scandal. The News Storyline ontology provides a way to describe a storyline externally to an article context, and then use it as an organising principle to link related developments into a narrative. The BBC News website already does this for a small subset of important stories but these are manually generated and so time consuming to produce and curate. By using a Linked Data approach and externalising the storyline as an organisation principle we can make it the default approach for every story that we cover.

http://purl.org/ontology/storyline

Using the Storyline data model we can define a storyline instance called ‘Snowden leaves airport’, make it a new development in the ‘US spy scandal’ storyline, and associate the storyline with the topics of ‘Russia’ and ‘Edward Snowden’. Using this storyline as a tag makes for a much more accurate semantic annotation — the content is about the storyline rather than a group of topics, and can be part of a wider narrative arc:

This presents some significant advantages over the current approach:

  • developments in context — as the story develops and new updates are created they can be presented in the context of a chronological narrative about that storyline, allowing the audience to see what came before and after without the need to retell the story
  • web friendly — a single URL that shows the audience the development of the storyline from when it first broke right up to the latest development; users can share individual developments on social media (as addressable #fragments of a page), while search engines can index the entire BBC storyline on a single URL
  • better use of resources — currently links between BBC News articles must be made manually; annotating storylines with topics effectively automates this process and frees up journalists to focus on content creation rather than curation.

There is a further advantage, perhaps less obvious in terms of immediate audience impact but maybe of great significance longer term: the BBC newsroom will be building up a knowledge graph of news storylines, topics and associated BBC content. Currently all of BBC News online’s knowledge exists in the heads of its journalists and in siloed internal production systems; through capturing these content — storyline — topic associations in the BBC’s Linked Data Platform they can be be made available to journalists approaching a new storyline to better inform their work, providing them with a semantic database of news storylines and related topics.

Implementing storyline metdata — the stream

Content on the BBC News website is produced with a home-grown Content Production System (CPS) which allows journalists to create articles and other content types such as video collections or live event streams. Once published content is linked to from manually curated indexes, allowing users and search engines to discover it. The manual curation of these indexes is the mechanism employed for setting the relative importance of running storylines: higher up the page + bigger = more important. (This last point might seem rather obvious but it’s important to recognise this as we move towards metadata-driven content aggregations: databases and search engines can do subject-matter-by-recency very well, but relative importance is a human editorial judgement.)

Semantic annotation of BBC content with real-world topics has been implemented in CPS for some time, and is already used to drive dynamic aggregations of content on the BBC Sport website. At content authoring time a journalist can search the BBC’s Linked Data Platform (LDP) for concepts that are mentioned in the content, and then add them as metadata annotations. When the content is published the annotations are written to the LDP in the form of the subject-predicate-object triples described above. The idea is to make the annotation process introduce as little disruption as possible, to allow journalists to keep focused on what they do best, creating news content.

In its simplest form a storyline can be thought of as just another kind of tag. It’s possible to employ the same low-maintenance annotation process as the BBC currently uses for topic-tagging in CPS, and aggregate content chronologically by storyline. This is the approach taken by ITV News to present storylines as a stream of developments, with an emphasis on short-form updates and in-line multimedia.

This represents a paradigm-shift in online news: instead of a new article (and web page) for each development linked to from a manually curated index, news indexes can be dynamic list of recent storyline updates, with each update linking back to the full storyline (the context). All a journalist has to do is annotate their content with a storyline to surface it on that storyline page (and it’s related index). Additionally these indexes can link to persistent storyline page URLs, not needing to be updated with a new link every time a new piece of content is published.

This approach encourages the use of short-form updates to iteratively tell the story as it unfolds — text, audio/video captured from mobile devices, social media updates, and so on. These storyline updates can work well on mobile and tablet devices, and encourage sharing on social media — particularly multimedia content. Long-form pieces are still there of course, but they are employed by expert correspondents for in-depth analysis rather than being the de-facto medium of news reporting.

BBC News has been running a pilot of this storyline-stream based approach in local news with a small group of audience members. Their initial reaction has been very positive, but also valuable in pointing to the areas where the stream on its own doesn’t serve their needs.

Beyond the stream — recency vs. importance

Being able to collect content around a storyline and present it to the audience as a chronological stream can quickly aggregate a lot of content for an important news story. The challenge for the consumer then becomes one of understanding: in all of these many updates which ones are the most important? What were the key events? Which pieces of content are analysis and explanation rather than straightforward fact reporting?

As Alexis C. Madrigal has pointed out, a problem for the stream approach is that it values recency above all else

“The Stream represents the triumph of reverse-chronology, where importance—above-the-foldness—is based exclusively on nowness.

For smaller news providers this can work well, but the BBC is a large news organisation with the resources to quickly collect a great deal of content about a story, from fast-moving short-form updates at one end of the spectrum through to visual journalism infographics and expert correspondent analysis at the other. How can we provide signposts to help our audience navigate the stream, and work out what’s important?

The diagram above shows one way to approach this problem. Instead of simply stating that a piece of content is ‘about’ a storyline, a journalist could say that it was of particular significance — an ‘editorial pick’ for example — and as such it could be promoted as key content whether or not it is the most recent update in a chronological stream.

This works by varying the linked data predicate — the part of the semantic annotation that describes the relationship between the subject (the piece of News content) and the object (the storyline). Through extending choice of predicates we can say more about the relationship between the content and its storyline: ‘summary’, ‘editorial pick’ and so on. These different relationships could then be surfaced on a metadata API that would allow audience-facing applications to handle those pieces of content differently.

(This is one of the areas that BBC News will be exploring at the next NewsHack event (May 2014) where we will be working with other news organisations to explore content modelling within the context of the stream.)