For quite a few years I’ve worked on a variety of projects attempting to create tools to help with the interpretation of news. Not that there aren’t any out there. There are plenty, but the deluge of breaking stories makes practical news analysis and consumption exasperating. A great quote from a friend in 2008 (Clay Shirky) highlights the persistent problem — “It’s Not Information Overload. It’s Filter Failure”.
For this particular project, these are some of the issues I was trying to solve:
- Determining why certain stories are trending — establishing the genesis and the trajectory.
- Who/what is involved in trending stories and mapping relationships between these entities over time.
- Establishing patterns of bias. While truth is subjective, I think presenting obvious impartiality helps with interpreting stories.
I started by crawling the top Kenyan news sites and analyzing each article and source individually.
The first challenge is how create an entry point for news analysis. Some of the ideas I iterated over:
- Document clustering — analyzing the text and putting articles into thematic groups
- Automatic topic modeling — generating article summaries then discovering and assigning topics
- Social media — rank/group articles by social media interactions
I went with the last option as it was the lowest hanging fruit.
Document clustering yielded results that I didn’t find very useful. The results were almost too general and skewed too much toward highlighting frequent mentions like president and political parties. The algorithms clearly need some work. Automatic topic modeling on the other hand requires some effort in training and testing models. I haven’t yet gone too far down this path. I suspect however that one of these options will be better in the long run.
The first iteration of the landing screen presents the most popular articles which will provide a base for further analysis.
- displays the top entities involved, ranked from most frequently seen.
- displays the sources from which all the currently trending articles are coming from.
- lists out the trending articles along with the social media interactions we use to rank them.
An entity is a person, place or organization. Entities are extracted from each article and relationships between entities are tracked over time. In the above example:
- Sachangwan the entity we’re analyzing, is a black spot in Kenya that claims a lot of lives, especially over the holidays. Tracking its relationships over time, we can see that it is an entity frequently associated with the NTSA (The National Transport and Safety Authority of Kenya) and Modern Coast (a bus company) in many of the articles.
- The right column allows you to scroll through stories associated with Sachangwan and rank them by published date or by social media interactions.
- A slider at the top allows you to define the depth of Sachangwan’s entity relationships over time.
Selecting an individual article scans articles from other sources and tries to find similar stories. Overlaying them helps with a broader analysis. We can then present summaries, ranked entities, maps, etc, and find developing patterns. I also experimented with presenting aggregate sentiment analyses of stories, but I’m not yet sure how much value it provides.
Visit Chyulu.com to see what I have so far. Any feedback is very welcome.
Chyulu is a panoramic volcanic formed mountain range southeast of Nairobi in Kenya. I definitely encourage you to visit this beautiful part of the world.