Augmenting local newsrooms

London-based news agency Urbs Media — runner-up of last year’s Startups for News competition — is using automation to strengthen local newsrooms.

Through mining large and largely ignored datasets, the agency uncovers localised news stories that otherwise wouldn’t see the light of day. Human reporters write stories using natural language generation technology (NLG) to produce many versions of a story, which are then adapted to different localities. Imagine, for example, you have a dataset about crime rates in the UK: Urbs’ technology could automatically create an article about falling crime rates in Liverpool and another one showing increasing crime rates in Manchester.

We chatted with Urbs Media CEO Alan Renwick and editor-in-chief Gary Rogers about their RADAR project (Reporters and Data and Robots), the many pros and few cons of working with automation, and coffee.

(The interview has been edited and shortened for clarity.)

GEN: What is your ambition behind the RADAR project?

Urbs Media: Our ambition is to run an agency-like model where we produce content for people or for newsrooms. Our partnership with the Press Association on the RADAR project, which is currently underway in the UK, is an agency-supply model: we use Natural Language Generation (NLG) to produce copy and then send it to multiple newsrooms in a ready-to-use way.

We started the project on the basis of making NLG a tool for journalists, not for developers: so that a journalist like myself can use it at their desktop. We therefore looked for an NLG solution in the market that we could make more user friendly.

So far, we have a sample group of publishers to whom we started sending sample copy about two weeks ago. Some of the copy is being used just as it is — in the way that agency copy works — and some of it has been edited. Some publishers have added additional local context, which we can’t do ourselves as we’re a centralised agency.

Is your service scaleable?

We’re slightly limited by our tech development at the moment in terms of full scale distribution: we have an automated process to produce stories but we don’t have an automated process to distribute stories. We produce copy using NLG and then email it out to people! This means that at the moment, we’re only distributing about a tenth of what we could.

A core part of what we’re doing with RADAR is building a smart distribution tool that will enable us to match up copy with localities. Say we have produced 500 local versions of a story, the tool will know which localities each of those stories is attached to and deliver it to the correct news outlet. We will have that serviceable at some point next year.

‘The thing which is in some way different to other entries in the GEN Startup competition is that we provide a service rather than a product. We don’t give you our platform and off you go: we are developing a set of tools and a set of skills to produce a service and we then provide the service to you.’

We’d like to have two partners in each international marketplace. Effectively, you use the same time and resource producing 500 versions of the story as you would producing ten versions. One of the perks of NLG software is that it works in numerous languages. We are starting in the UK and Ireland with the Press Association as a partner, but we are looking for other large-scale providers in other markets, like other national news agencies or large media companies who want to distribute across their whole territory.

Another thing to consider when it comes to scalability is the data itself. The majority of national markets are not as well served as we are here in the UK in terms of clean data. So far, we’ve identified ten countries in the whole world where the data quantity and quality are at a level where it’d be possible for us to set up a service quickly — these are mostly European and North American.

What type of data do you use in the UK and how do you prepare it?

  • We’re using open data from very established sources: official statistics and government sources. We have a very high calibre of data sourcing at the moment, but errors in the datasets are still possible. This means that there is a certain degree of diligence required when we analyse and process it.
  • We do a certain amount of data cleaning in all datasets to prepare it for the NLG process.
  • We structure the data to allow us to run the stories out of it.
  • We’re pretty confident in the provenance of the datasets that we are using, but as we expand out to look at wider ranges of data and raw data, will we undoubtedly have to do more data work.

Do you see any concerns with regards to using automation?

The biggest obstacle is that we are scaling what we do. One of our team writes a story and thanks to the technology, we can produce 500 versions of that story. This is great as that really leverages our work… but if you make an error, it really leverages your error. If you make one mistake, you’ve got 500 stories wrong!

If you send one version of a story about child obesity to Birmingham and another to Liverpool — is there a worry that all of the content becomes a little bit too similar — like Starbucks for journalism?

I suppose everyone likes coffee!

Isn’t it better to tell the people of Birmingham and Liverpool how fat their kids are? We give publishers the story and they can then decide what to do with it. I understand your point about homogeneity of news — but we’re talking about stories that aren’t being covered at all.

What we try to do — this is part of what we think will be the human magic of working with automation — is to find a way to develop different angles for stories. The differences in the stories will not only be the numerical differences, which give you different data points, but the angle and approach to the story those data points lead you to. While we might do a story about child obesity across a number of different cities in the UK, we might have a slightly different angle to the story for Liverpool than for Birmingham. We’re trying to find the best story for a particular newsroom in the way that a reporter would. This is what makes our use of NLG more complex: we’re not just trying to turn words into data and give everyone a version, but we’re trying to do a very journalistic job around the data.

How has the engagement been with the stories you’ve sent so far?

Feedback on the content has been very positive with our pilot group — most of them have been asking for more copy or more versions of the copy. The first story that went out got a lot of comment on the newspaper’s own website and even more comment when it was shared through Facebook. We’ve only been operating for two weeks in production and we’ve only sent out five stories in total, so drawing conclusions about engagement is still a little too early for us as. At this point, we just need to establish a strong journalistic product and then look at how it engages.

How many stories will you release per month?

Our goal over the next few months is to gradually build up the production and the pilot group interaction we have, so that in a few months time we’ll be producing around five stories per day. These five stories will have, on average, about 200 different versions, so we’ll actually be producing around 1,000 stories a day or 30,000 a month for the marketplace.

The idea is to have another two people on the team (at the moment we’re four), to build a workflow that we’re comfortable with. We will gradually scale up a little bit so that we’ll be writing a story a day each. Later next year, we will also start adding assets — like charts and video — into the packages we send out. They will also be localised versions of the same material.

How do you make money?

Our pilot group works on a free-of-charge basis and we will continue to do that for a while even as we scale up. The idea is that in the middle of next year we’ll start testing some different pricing models and approaches: subscription and paper use. Then we can determine what kind of models and pricing levels are useful for our different types of customers. The customer type varies greatly from very large groups of titles which may have 200 newsrooms and are part of big PLC companies here in the UK, right through to very small independent publishers, and even hyper-local bloggers. Our methodologies and output are highly localised so our content is hopefully quite valuable for these sorts of very localised publishers.

Anything else you would like to add?

You can say how fabulous we are.


Alan Renwick is CEO of Urbs Media. He was previously head of strategy at Local World Media.

Gary Rogers is co-founder and editor-in-chief at Urbs Media. He was previously owner of GRMedia and has worked for the BBC.