From 1914 to 1945 in minutes: An entity-centric view of history

Ambiverse
Natural Language Understanding
4 min readOct 24, 2016

--

Entities tell a lot about what is written in text. Just by looking at entities and their categories, we may grasp the topic of the document, have a temporospatial idea of the depicted events, and even understand the views of the author. Entities provide an accurate summarization of the information inside textual data and are able to highlight unexpected insights.

In this post we will analyze big collections of textual data in a matter of minutes. How long would it take you to read and collect data for 30 years of history?

We used Ambiverse’s Natural Language Understanding API to process the book The Age of Extremes by the British historian Eric Hobsbawm. Specifically, we focused on the section from 1914–1945 (The Age of Catastrophe). The idea is to identify every entity mentioned in the text like persons, locations, organizations, etc so that the data in the text can be easily summarized. The following snippet provides an example output from the system. Of particular interest is the fact that no matter how the entities are mentioned (e.g., Petrograd or Saint-Petersburg) we can identify them correctly.

It was the longer-term prospect that was problematic, even supposing that the power seized in Petrograd [Saint Petersburg] and Moscow [Moscow] could be extended to therest of Russia [Russia] and maintained there against anarchy and counter-revolution. Lenin [Vladimir Lenin]’s own programme of committing the new Soviet [Soviet Union] (i.e. primarily Bolshevik Party [Communist Party of the Soviet Union]) government ...

One straightforward analysis is to look at the prominence of the entities by comparing their frequencies. The next chart shows the book’s 20 most frequent persons between 1914–1945. The first 10 or so may seem certainly obvious. However, down the list we find not only politicians, which tend to be more noticeable (think of the everyday newspaper) but also intellectuals that, at least on the author’s view, were highly influential for the period. According to our knowledge graph, the list contains almost as many politicians (14) as intellectuals (12) and is only dominated by these two categories.

One may think that ideologists are probably as important as executives. These results may give some support to a famous quote by one of the intellectuals in the list:

The ideas of economists and political philosophers, both when they are right and when they are wrong are more powerful than is commonly understood. Indeed, the world is ruled by little else. Practical men, who believe themselves to be quite exempt from any intellectual influences, are usually slaves of some defunct economist.

John Maynard Keynes

Most mentioned persons

Let’s now see what happens in a more specific field like the arts. The results for the chapter The Arts 1914–1945 are displayed in the next figure. The table shows the most prominent entities in the chapter, including artistic disciplines, movements and persons. Interestingly, although the US cinema seems to be the most prominent artistic entity, there are no people related to it in the rest of the list (american actors, producers, directors, etc). In the same way, it is worth noting that our knowledge graph tells us that architects seem to be few but very prominent (Walter Gropius, Le Corbusier and Ludwig Mies van der Rohe are at the top of the list) while the most frequent categories are those involving authors.

Most frequent artistic entities

The next graph shows the top 20 locations mentioned for this period. It is clear that the events are mostly centered in Europe and Asia, and that the Western European countries plus the United States and Russia played a central role. One may be surprised by the prominence of India in the list compared to the participants in the world wars which seem to be the most prominent events. The importance of India, however, comes from chapter 7 when the British author discusses the decolonization process to which he gives great importance.

Most frequent locations

As mentioned, the author named his book The Age of Extremes reflecting a period where world societies followed extreme views. The author’s idea is certainly reflected in the top 5 political categories discovered by our technology in the 1914–1945 period.

Most frequent political categories

--

--