When computer scientists want to assess the efficiency of algorithms, they employ the notions of time and space complexity. These measures help answer familiar questions like why is this taking so long? and should we go buy more RAM or more CPUs?
When movie reviewers write about a new movie, they often refer to old movies: reviews of Finding Dory One refer to Finding Nemo, and reviews of Ghostbusters (2016) inevitably mention Ghostbusters (1984). Roger Ebert’s review of Apocalypse Now mentions Platoon, The Deer Hunter, Full Metal Jacket and Casualties of War, each of which has its own Ebert review and list of references.
Imagine a rule: before you watch a movie, you must first read the review of that movie on rogerebert.com. Easy: people do that all the time.
Now imagine a second rule: before you watch a movie, you must first watch all the movies mentioned in its review.
Putting the rules together you would have to read the reviews of those movies first, and then the reviews in the reviews, and the reviews in the reviews in the reviews.
Computer scientists will recognize this as a graph traversal problem. We know that the recursion is not infinite, because the set of movies is finite. We know that it is possible to watch all the movies that Ebert watched, because Ebert (1942–2013) was, bless his memory, also finite.
But, by the time you caught up on all the prerequisites for the movie you actually wanted to watch, would that movie still be in theaters?
The bottleneck here is human attention. It takes two hours to watch a two-hour movie. (Skipping ahead, or watching at 2x speed, is an abomination for the purposes of this paper.) So it is possible to compute the Ebert citational complexity for a given movie: it is the sum of the runtimes of all of the movies in the graph of Ebert reviews rooted at the original movie.
In XKCD 214, the Tacoma Narrows Bridge is shown to have a Wikipedia citational complexity of three hours and counting — one presumes Randall Munroe decided to prune the graph at an arbitrary depth and go to bed.
What is the citational complexity of a paper in a given discipline? The bibliography of any given paper expands to a stack of further papers, each of which lies atop a stack of their own.
How many years would it take to read all the foundational texts? Way too many.
Obviously, some kind of pruning process is constantly at work: journals and university presses regret that they cannot publish everything that comes to them; librarians sigh that they cannot acquire everything for their collections; professors walking through the stacks sigh that they can take only so many books back to their office; they sigh again when they pare down reading lists for the incoming class, knowing that students can absorb at most a handful of books and papers before the semester ends; the students sigh and skim the introductions, because they need to get on with job applications.
Even in a master’s or PhD program is there enough time to cover all the relevant research? Way back in 1689 John Locke had to ask Huygens about Newton’s Principia; Huygens responded that the proofs were sound.
Maybe different fields have different pruning strategies. A young discipline might have few enough papers that a researcher could expect to truly master them in a few years. An old discipline is like a huge tree with many branches: an insect could live out its entire life without exploring more than a dozen leaves. You have to skim the summaries, accept the abridgements, and take the larger part of an entire field on faith.
How would we measure the different citational complexities of different fields? Traversal of http://citeseerx.ist.psu.edu/index should offer a visualization of the graphs of different fields. Traversal could also offer a recommended reading list of the most cited works in the history of a field, which would permit the most effective pruning of the dependency graph. Has such work been done?