Wikipedia Mining Algorithm Reveals The Most Influential People In 35 Centuries Of Human History

The top ranked men and women will surprise you


Whatever your interest in history, most of what you have learned will be strongly influenced by your language and your cultural background. The historical figures who feature strongly in Chinese schools will differ dramatically from those that feature in US schools, or Indian schools or Russian ones.

This kind of bias is also reflected in the various language editions of Wikipedia. For example, it’s not surprising to find that the Chinese language version contains more links to Chiang Kai-shek, who once led the Republic of China, than the German language edition.

That raises an interesting prospect. Perhaps the network of links between the Wikipedia articles about historical figures provides an objective way to assess their importance. So a person such as Napoleon who is highly ranked in many different language editions would be more influential than, say Park Chung-hee, the South Korean president and general who was assassinated in 1979, who is top-ranked in only the Korean language edition.

Today, Young-Ho Eom at the University of Toulouse in France and a few pals publish just such a list. These guys have used network theory to rank historical figures by importance in each one of 24 different language editions of Wikipedia. They then compare the lists to see which figures span different cultures, allowing them to calculate the most influential.

What’s more, by looking at the birth dates of these figures, the team are able to tease apart the way different cultures have interacted in the past and how the influence of different cultures has waxed and waned throughout history.

This list throws up some surprises. Depending on the ranking algorithm these guys use, the most influential figure in human history is either Carl Linnaeus, the 18th century Swedish botanist who developed the modern naming scheme for plants and animals, followed by Jesus; or Adolf Hitler followed by Michael Jackson.

First, some background. Eom and co begin by extracting over 1 million biographical articles from the English language version of Wikipedia. They then looked for the same people in other versions of Wikipedia and ranked them all using the famous PageRank algorithm that Google uses to rank webpages, as well as a couple of similar algorithms.

(They say that fewer than 2 per cent of people who are top ranked in non-English versions of Wikipedia have no corresponding entry in English.)

The ranking process is crucial. It works by considering the network of links between articles in Wikipedia. PageRank considers somebody important if the article about them is linked to by other important articles. In English, the top 5 most influential people by this ranking method are: Napoleon, Barack Obama, Carl Linnaeus, Elizabeth II and George W Bush.

However, PageRank emphasises the importance of incoming links. To place more emphasis on outgoing links as well, Eom and co also use a ranking algorithm called 2DRank that does this too. By this method, the top 5 most influential people in the English version of Wikipedia are: Frank Sinatra, Michael Jackson, Pope Pius XII, Elton John and Elizabeth II.

They go on to use both these algorithms to rank people in 24 different languages including, Chinese, Russian, Indian, Korean, Arabic, German and so on.

Here are the Top 5 PageRank people from the Chinese Wikipedia: Carl Linnaeus, Mao Zedong, Napoleon, Aristotle and Chiang Kai-shek.

From the Russian Wikipedia, they are: Peter the Great, Carl Linnaeus, Napoleon, Alexander Pushkin and Joseph Stalin.

And from the Hindi Wikipedia, they are: Jesus, Carl Linnaeus, Gautama Buddha, William Shakespeare and Alexander the Great.

The lists for the other language versions are here.

But this is just the beginning for Eom and co. They go on to count the people from one culture who are influential in other cultures. And the more cultures that an individual spans, the more influential that person must be.

That leads to a list of the top 100 most influential people in history. According to PageRank, the top 5 are: Carl Linnaeus, Jesus, Aristotle, Napoleon and Adolf Hitler. According to 2DRank, they are: Adolf Hitler, Michael Jackson, Madonna (the singer) and Ludwig Van Beethoven.

But how good is this list? To find out, Eom and co compare the full 100 names to other well-known but subjective lists of the 100 most influential people in history. One of these lists was compiled by the American researcher Michael Hart and the other by researchers at MIT. .

The overlap is significant, just under half the people overlap two lists. In fact, Eom and co say there is more overlap between their list and the MIT list is greater than the overlap between the MIT list and Hart’s list. That provides an interesting perspective on the results.

There are some additional points worth highlighting too. It won’t have escaped your attention that Eom and co’s lists are heavily skewed in favour of men. Eom and co say this is not surprising given that women have had little chance to make an impact throughout history.

When men are excluded, the top 5 lists look like this. For PageRank: Elizabeth II, Mary (mother of Jesus), Queen Victoria, Elizabeth I of England and Maria Theresa of Austria.

And for 2DRank: Madonna (entertainer), Elizabeth II, Mary (mother of Jesus), Queen Victoria and Agatha Christie.

Finally, Eom and co calculate which cultures have most influenced others and how this has varied over time. Overall, the most influential language is English followed by German.

“It is interesting to note that the ranking of cultures changes significantly in time,” say Eom and co.

By considering only people born before the 19th century, English drops to the fourth most influential in the list behind Italian, German and French.

And in general, Western cultures only become important after the 17th century. Before that, Greek, Turkish and Arabic cultures dominate.

Of course, Eom and co are acutely aware of the limitations of this kind of study. They point out for example, the limitations of linking language and culture and of linking historical birth place with the language spoken there now. But these are the necessary restrictions of a computational approach. Then there is the absence of certain language versions of Wikipedia, such as Ukrainian, Serbian and so on.

Given these limitations, the bottom line is this: “Our analysis shows that most important historical figures across Wikipedia language editions are born in Western countries after the 17th century, and are male,” say Eom and co.

That’s a fascinating insight into the history of human kind as it plays out in the pages of one of the world’s greatest repositories of knowledge. We’ll look forward to seeing what other gems network science can tease out of Wikipedia in the future.

Ref: arxiv.org/abs/1405.7183 : Interactions Of Cultures And Top People Of Wikipedia From Ranking Of 24 Language Editions


Follow the Physics arXiv Blog on Twitter at @arxivblog, on Facebook and by hitting the Follow button below