Historical intellectual connections
There are some remarkable facts among these statements. For instance, chemist/ cognitive scientist Christopher Longuet-Higgins, who did amazing work on the psychology of music, was the doctoral advisor of both Peter Higgs (known for proposing the Higgs field and Higgs Boson) and Geoffrey Hinton (a huge name in artificial neural nets and machine learning).
However, I’m more curious about grand-doctoral-advisors, great-grand-doctoral-advisors, and so on: relations that connect people across centuries. These are implicit in Wikipedia, in that you could get this information by clicking through lots of articles, but Wikidata lets us query and visualise them.
Wikidata Graph Builder
Carl Friedrich Gauss, one of the greatest mathematicians of all time, was a doctoral advisor to several other mathematicians, so he makes a natural starting point. His name goes in the “root node” box (it auto-completes Wikidata entry names, which speeds things up). I choose “Doctoral advisor” for the traversal property, and “Reverse” as the direction, because I’m looking for people advised by Gauss. Once I press “build”, my screen explodes into a massive, shimmering tree of PhDs, and takes a while to settle down. Here’s the active link.
Making sense of this massive tree takes time and plenty of use of the mouse wheel to zoom in and out. Gauss himself is highlighted with a blue dot, and clicking on any name brings up the person’s Wikidata entry. At the ends of the tree are some people living today, including the highly notable physicists Lawrence A. Krauss and Michio Kaku.
I wanted to find connections between well-known people in different centuries: connections that would not be obvious just from reading Wikipedia articles about these people. Linking Gauss (born 1777) to Krauss (born 1954) is the sort of thing I was after. But there’s more!
When I demonstrated this graph builder at the Oxford XML Summerschool, the audience pointed out that with the influenced by (P737) property it should be possible to make a graph of music artists influenced by Pink Floyd, by Jimi Hendrix, and so on. While it’s straightforward to build the query, the data just aren’t there as yet, and a lot of musical influences are presumably hard to find objective references for.
A couple of caveats
How trustworthy are the results of these queries? We have to consider the slight possibility of hoaxes and the more realistic possibility of confused identities, so if the query yields a new discovery, we would have to verify the references for each individual claim. Doctoral advisor relationships are usually imported from Wikipedia infoboxes, so checking would mean checking the references in each article. Other properties may have their sources better represented in Wikidata itself. Fortunately, Wikidata queries can list the sources that the claims are based on, which I hope to cover in future posts.
It’s worth noting that when querying Wikidata we are looking at chains of notable individuals. Merely having a doctorate, or being an academic with doctoral students, is not sufficient for notability: notable academics are outstanding in some way, often by getting professional awards. English Wikipedia has a notability policy specifically about academics which is worth a look. Wikidata’s definition of notability is not exactly the same, but when a notable scholar is the doctoral advisor to a run-of-the-mill academic, who in turn is doctoral advisor to another notable scholar, that connection will not usually be represented in Wikidata. My chain of advisors stops after two steps, because there does not seem to be a source for Nancy Cartwright’s doctoral advisor and they may not be notable.
Finding the longest chain
I became curious to see longer chains of doctoral advisors. One next step is to change the direction of the C. F. Gauss query from “Reverse” to “Forward”. The meaning of the doctoral advisor relation implies that this will take us backwards in time: Gauss’ doctoral advisor, that person’s doctoral advisor, that person’s doctoral advisor, and so on.
I’m not just interested in the longest chain involving a particular person, but in the longest chain that Wikidata knows about. For this we need a query to retrieve all long chains, and order them by length. A simple measure of chain length is time between births of the two people at either ends of the chain. (I don’t yet know how to order by number of steps.)
Numerous currently-living people appear in these long chains. One familiar name that stood out is Wikidata founder and former Wikimedia Foundation board member Denny Vrandečić.
The longest chain starts with Gregory Choniades, a Byzantine astronomer born somewhere from 1240 to 1250, and ends with Sabrina Gonzalez Pasterski, a Cuban-American physicist born in 1993 who, though young, has won several awards. Some birth dates appear as more recent, although those are actually non-specific dates, stored in Wikidata as “20th century”. My query is forcing them to appear as a specific date, which is output as “1 Jan 2000”.
Now I’m curious about how these two people are connected. I don’t want a tree of “descendants” of Choniades, or “ancestors” of Pasterski, but a line connecting them both. This calls for a query finding each pair A and B where A is the doctoral advisor of B, A is a “descendant” of Choniades and B is an “ancestor” of Pasterski. Among the options for the Wikidata Query Service is to present the results as nodes in a graph, making something similar to the Wikidata Graph Builder, less immediate but much more customisable.
C. F. Gauss appears in this chain, so his earlier choice was a lucky guess. I was hoping for a graph a few centuries long, but wasn’t expecting a 33-step chain covering three-quarters of a millennium. What’s more, there is no way I would have clicked through 33 Wikipedia links (and hundreds more dead-ends) to find it.