There are 28 countries within the European Union, and 29 conversations about where it’s heading. Advanced language technologies could link these conversations together and help create the demos the EU needs.
Note: this is a (heavily) edited repost from BlogActiv.
Across the 28 countries within the European Union (EU), there are 29 conversations about where it’s heading. These conversations exist in national media: in every country, a constant stream of newspaper and magazine articles, journal and conference papers, books, television & radio programmes about the EU and that country’s role within it.
But the EU as a whole has no media, the recent arrival of Politico Europe notwithstanding. The most intense conversation about the EU is the 29th, occurring inside the Brussels Bubble - a 50+ year-old echo chamber within which generations of EU professionals have been discussing the EU’s future using language noone else understands, despite being in impeccable English or French.
conversations about Europe do not exist on a European scale
While the walls of culture, language and context separating these 29 conversations are not solid, it is fair to say that conversations about Europe do not exist on a European scale.
No democracy can function when those governed have no idea what’s going on above their heads. Unfortunately, that’s how the EU evolved, which is perhaps why the last European Parliament elections showed around three-quarters of the electorate believing the EU to be unimportant or harmful and/or undemocratic.
Few people learn much from their
peers across the border
But it doesn’t have to be like this. After all, in each EU country, people tackle the same problems — energy, employment, gender balance, art, industry, human rights … — and all of them are supposed to cooperate on these issues within the EU framework.
But most arguments around these topics remain stubbornly national, or hermetically sealed within the Brussels Bubble. Few people exchange much with their peers in other countries, and the European implications of national policies usually enter national conversations only in times of crisis.
Advanced language technologies could help bridge these barriers, in the process bringing people together from across Europe around shared problems.
It’s simply a question of improving content discovery across Europe
Curating ideas, networking people
The idea’s starting point was BloggingPortal, launched by a loose community of volunteer Eurobloggers to curate blog posts about EU issues. Before these volunteers finished their phDs and got jobs, spouses, kids and lives, the site (now dead) worked like this:
- initial curation: volunteers identify and add sources of relevant content to the engine
- the title and opening words of each new post is automatically piped into the engine
- volunteers manually tag each post, and highlight the best ones
- the posts are published on the site, pointing back to the original content (the idea is to help publishers find audiences, not rob them of traffic).
- the best (i.e., manually highlighted) posts appear on the Home Page and were promoted by enewsletter and Twitter
Over 317000 posts from 1100+ blogs were processed in around 4 years:
But when the volunteers stopped manually tagging each post, BP became nothing more than a glorified RSS feed for the Brussels Bubble, for whom “everything about the EU” is still relevant, interesting and useful.
outside the Bubble, most people
are interested in something — just not the EU
Which is a shame, because outside the Bubble, most people are interested in something — just not the EU. Provide them with a source of interesting content from across Europe relevant to their interests (environment, employment law, research, human rights …), and they may discover ideas — and their authors — from other countries. They may even even better understand the European aspect of their field of interest (see Specialists required to build bridges).
Add automated semantic analysis …
So what happens when you add semantic technologies and widen the scope to all longform content?
Answer: ‘machine-assisted human curation’:
- As before, sources are identified manually — human curation is still essential. However, the scope is widened to all types of longform content (news, analyses, research, feature articles…).
- The entirety of each article is crawled into the engine as it is published (but never displayed in full on the site)
- Language recognition and semantic analysis software (I tried Apache Stanbol and ConText) then automatically tag each article using a multilingual, policy-oriented taxonomy (EUROVOC). These tags then map each article to 1–2 high-level Themes for the site navigation (see below).
- The title and opening lines of each article is auto-published on the relevant Theme and tag menus. The search engine can search the entirety of each article, while registered users can highlight favourites.
… powering faceted search
With each article consistently tagged, faceted search makes it incredibly easy to discover Who is saying What in any policy area, today, yesterday or even years previously, in many languages.
Time for some wireframes from the specs:
- Welcome to the Home Page. Site navigation is actually a natural language phrase: “Show me the best content about all themes about all countries written originally in any language any time”
- Each key phrase (in colour) is a filter, with mouseovers allowing users to change their settings. The only filter active on the Home Page is “Best/All”, so the Home Page shows only content highlighted manually by the Editors
- So let’s tweak those filters to: All content classified Environment published this week. There are 16 results — the ‘Refine tool’ (right) now shows the tags auto-assigned to them, and used to map them under Environment’
- So click on ‘+ Windfarm’ to replace the ‘Environment’ Theme with the far narrower ‘windfarm’ tag. We now have only 4 results …
- … so let’s click ‘+ Wildlife’ to narrow the search further, but widen the time horizon to: All content tagged Windfarm AND Wildlife published This Year
Wireframe testing showed this allows users to find the exact resources they need, starting from 100,000s of articles in 20+ languages, in under a minute.
But we’re just getting started…
Human curation & community
People remain part of the process ... ideally.
While the platform requires humans to choose the sources of content, it could then be ‘set and forget’: the semantic analysis engine could chug away, auto-curating content, untended.
However, such a platform would be ideal for creating and sustaining Communities of Interest, who would self-assemble to identify sources of content and use the resulting services.
In the process, they would refine the automated systems and provide human added value, increasing content discovery further.
In any case, the content wouldn’t just stay on the site:
- Community members can both highlight the best articles to the Home Page and validate/edit the tags assigned by the machines. I calculate that human tag validation could easily ‘train’ Apache Stanbol in 9–10 more languages, ensuring open-source semantic analysis software covers all European languages.
- Highlighted articles, as before, are promoted by enewsletter and Twitter, but now newsletters and Twitter feeds per Theme become possible, allowing users to follow specific themes (and making the ‘Rebelview’ interface, below, possible)
- Also possible: an API, allowing other publishers to pipe the firehose of semantically-enriched content into their CMS for further processing and syndication, increasing content discovery further.
Community members can also provide other ‘added value’ activities, from promoting the service to manually curating specific Themes to establish their authority in a specific field:
The Rebelview interface
The above wireframe uses a second, completely different way to present and consume this content, courtesy of RebelMouse.
The Home Page, each Theme and each Country each get a “Rebelview”:
- As mentioned above, each article is auto-Tweeted by the Twitter accounts associated with each Theme and Country the article is classified under. The best articles are also Tweeted by the principal platform account
- Each Twitter account drives a Rebelmouse account, embedded on the site, giving us a ‘Rebelview’ of the best content on RebelView Home …
- … and Rebelviews of all content per Theme …
- … and a Rebelview for each country.
As Rebelview doesn’t let you drill down into the tags, it’s less powerful as a library than the ‘Refine View’, above, but is a much more fun way to consume the content as news (more on the distinction).
Autosummary, Sentiment Analysis & Machine Translation technologies
Improving content discovery through multilingual, machine-assisted curation is just the beginning:
In the above wireframe, users select articles for processing (checkboxes, left) and choose a premium service (dropdown, right) to apply to them. These services could include:
Because the taxonomy is multilingual, search results will be in many languages (unless you use the navigation to filter by Language first). With this feature, users can select interesting looking articles to have their titles and abstract auto-translated, giving a better idea of whether the full article is worth visiting on the publisher’s site.
This one’s fun. Choose a few resources and have a summary report produced for you, using technologies similar to Summly.
Sentiment analysis / Opinion mining
Are the articles positive or negative? If you’re the sort of person who prefers checking out someone’s Klout score rather than actually reading what they produce, this is definitely for you.
These are, of course, just a few of the huge number of language processing technologies out there under development, so if you can think of any others which could be useful, drop me a line.
All of them open up interesting questions in the areas of copyright and content monetisation — offering premium services on a subscription or micropayment basis (cf Blendle), for example, could allow revenue sharing between the site and the curated publishers, helping the latter monetise their back-catalogue.
Such an approach would appear to be a win-win for media and democracy, as well as helping communities to form and grow across boundaries of language and culture to tackle transnational problems.
And yet the only example I’ve noticed so far is Echos360, a French ‘aggrefilter’ for business content (above):
“… unlike Google News that crawls an unlimited trove of sources, my original idea was to extract good business stories from both algorithmically and manually selected sources… to effectively curate specialized sources — niche web sites and blogs — usually lost in the noise”
– Building a business news aggrefilter (Monday Note, February 2014)
So my original 2009 idea turned out to be an ‘aggrefilter’? Please, there’s got to be a better name for it than that. All suggestions gratefully received.
If you found this useful, please Recommend it and/or Share it with your networks.
Originally published at mathew.blogactiv.eu on April 27, 2015.