Humanities [Research] in the 21st Century

Ian Matthew Miller
Roots and Branches
Published in
11 min readAug 28, 2014

There is a great deal of thinking going on about the state of the academy in the 21st century. As many of you — my core readership of 9 (by latest stats) — know well, I occasionally participate in this type of speculation. Whether this is merely the type of academic changing-of-the-guard essay that occurs about once a generation (often with its own set of journals), or premonitions of some greater shift is yet to be fully known.

Luckily, I have tools to deal with the uncertainties of the future. They are the same tools I use to deal with the uncertainties of the past: random snippets of tangentially-related data, and bare speculation. As any good historian, I will address what I see as paradoxes in my understanding of the situation, hoping that unraveling these specifics will yield a greater comprehension of the broader trends.

I actually wrote most of this essay before posting any other content on this site, but the volume of things I want to say has exceeded what I am currently able to articulate. This is apparently a common problem on the internet. Rather than letting it fall entirely by the wayside, my stop-gap solution is to divide this into two (or more) essays. This first (arguably complete) one will focus on aspects of doing humanities research; the second (as-yet-incomplete) foray will return to these issues with a focus on questions of pedagogy.

Paradox 1: Volume of total information grows, size of information segments shrink

The amount of information in the world is huge and growing faster than ever — 2.7 zetabytes (2.7 followed by 21 zeros) in 2012 by one estimate. Most of this information is now native digital — a full 94% was in 2007, a number that has only increased. The massive growth in information is fueled by a number of interrelated trends: mass literacy, and the growth in internet access (especially in the developing world) have caused an explosion in the number of potential information producers; this is paralleled by a massive increase in server space to hold all this data (and somewhat chilling concentration of these servers in a few corporate hands).

Yet we cannot escape the feeling that this ever-growing body of information is shared in ever-smaller blips.

This latter impression is hard to confirm with data. Some trends (an increase in the number of objects per webpage) suggest that information packets are actually getting bigger. Yet for counterexamples we need think only of the tweet, or for that matter the text, WeChat (now under crackdown by Beijing), etc… Or any of the other constant and instantaneous distractions awaiting all of us the moment we look at our smart-phones to check the time.

This presents two challenges: a general challenge to organizing all information — that there is not only more of it, but it is broken into more pieces; and a specific challenge to organizing the types of information of interest to humanities scholars — that we like to write at length about self-contained texts.

What is the old-school…er…school to do? Can classical humanism — last bastion of the 60-page article on a two-page essay, lovingly crafted over seven years of intense research— adapt to the pace and sheer volume of information in the third millennium?

Resolution 1: Searching, filtering, structuring become critical skills, but our beloved close reading remains a key approach.

It strikes me that the days of the 60-page article are, in fact, numbered, but that does not necessarily spell the death of in-depth, focused research. Search technology has gotten massively better over the past two decades (remember HotBot? It still exists…), and allows even the relatively inexperienced researcher a relatively good way to find useful texts amid large corpora of unuseful ones. Automated translation has gotten better and better, further enabling non- or semi-specialized researchers access to otherwise inaccessible textual realms. Other, younger technologies — including topic modeling, and the semantic web (the cold fusion of Digital Humanities) — offer even better ways of filtering and parsing large bodies of [textual] data, albeit methods whose true potential has yet to be entirely realized.

But as of today, the best technologies DH has to offer still cannot read.

In my experience, the best results are still obtained when a trained specialist sits down and takes a close look at something. Technologies have greatly reduced the barriers to entry, largely by making it easier to find topical texts. But they are not a total substitute for the classical skill of careful reading.

Looking at my mentors, it seems to me that every successive generation’s Sinological skills are a bit worse. I am regularly astounded by the abilities of certain sexagenarian scholars to recall, from memory, the specific textual context of a seemingly random passage. Technology, coupled with the availability of digitized texts, is getting better and better at approximating this ability — largely by finding references amid the digital haze. But so far technology is like a replacement limb — better than no leg at all, but still worse than the original. It opens up the possibility of a much larger community of scholars, each of whom has less specialized content knowledge than previous generations, but who — as a network of cyborgs — have the potential to build on and exceed their accomplishments.

Paradox 2: Technological generations shorten, human generations lengthen

By many measures human generations have been getting longer. The age at which women first give birth has been rising; age at first marriage has been rising too. Both of these trends contribute to, but are distinct from, the simultaneous trends toward lower birth rates and marriage rates overall. People have been spending more years getting educated. Life expectancy has also been rising (although this trend may soon reverse), and the average age of retirement has (probably) risen with it.

Collectively, these trends mean that many of life’s milestones are further apart for each individual — birth to graduation, to marriage, to first child, to retirement. Generations are longer by many interpersonal measures as well — age difference between parent and child, for example, or between the youngest and oldest members of a workplace. Yet we cannot escape the feeling that generations are actually getting shorter, that people a mere five or ten years apart lack the same cultural touchstones. To some degree this is because different age groups live in distinct technological worlds: baby boomers watch CBS, Millennials watch YouTube; no-one under twenty uses Facebook anymore, few people over twenty even know what Snapchat is.

These “tech generations” are harder to quantify. Intuitively, there is a sense that new technologies are introduced faster — and that old ones grow obsolete faster — than they did when I was a kid (in the 80s), and certainly than they did when my parents (60s) or grandparents (30s and 40s) were children. This is not just a function of Moore’s Law — the principle that transistor density, roughly a measure of processor speed, doubles every two years. Moore’s Law is why my four-year-old computer has trouble handling more recent, more resource-intensive programs. But my four-year-old laptop has not so much been replaced by a faster laptop, but by tablets and smart phones, which are themselves growing rapidly obsolete. Tech generations therefore have more to do with the pace of introduction (or perhaps diffusion) of totally novel categories of product.

Beyond hardware, which appears to improve at a roughly fixed pace, the rate of change in apps and digital media seems to be increasing. This at least can be quantified. By two measures — the number of new apps per month, and (again) the number of objects per webpage— the pace at which new digital goods are introduced is steadily increasing. This presumably means that the average “lifetime” of each of these has declined. The rate of turnover in digital media is accelerating both within platforms, and probably (this is harder to quantify) across platforms. Not only are twenty-somethings not talking about the same TV shows as forty-somethings, not only are they not talking as much about TV at all, the new media they do talk about has such fast turnover as to make it challenging for anyone to keep up with.

Later and more extended periods of school, work, maternity/paternity, and retirement mean that the library, office, playground and other peer groups are full of adults of very different ages. After what is often an extended period of adolescence — much spent with age-peers — young people enter arenas full of people twenty, thirty or even forty years older. But if the gulf between older generations versed in “traditional culture” and younger ones native to “digital culture” is growing, the onus is both on the old to adapt to the new, and on the young to understand the aged.

Resolution 2: Teamwork and understanding become more important than ever, especially between people of different ages and media familiarities.

We are all familiar with the trope of parents asking their children to show them how to work the computer — this needs to be transferred to professional realms as well. Conversely, it is easy to forget about all the work parents do to inculcate their children into the dominant adult culture. This is also the way it should (and does) work in the workplace. Ultimately this means that older people, or those specialized in older media forms, should be open to learning or working with younger people, or those specialized in newer media. But it also shows that there is ongoing value for traditional culture in a digital age.

Paradox 3: Older sources are fewer in number but harder to access

By comparison to the amount of digital information in the world (2.7 zetabytes in 2012 according to the estimate above), the amount of non-digital information is puny. One estimate holds the total amount of this information at about 100 petabytes (http://www.lesk.com/mlesk/ksg97/ksg.html) — 0.0037 % or about one in three million parts of the total. Of this, only a small amount is text — the same source estimates that the Library of Congress holds about 20 terabytes of text and that about 1 terabyte of printed text was produced per year in 1991, before the onset of widespread digital text. Even if figure that the Library of Congress holds only 1% of all extant text, all the text in the world would still amount to only 2 petabytes. This is puny.

Yet largely because there is less of it, older information — most of which is text — is more valuable than new information. A manuscript from 1000 CE may be one of only thousands available from that year; a text from the year 0 might be one of tens; by comparison, a website from 2012 was only one in fifty-one million new websites created that year, not to speak of pages added to old websites, posts on Facebook, Tweets and any number of other textual media (http://royal.pingdom.com/2013/01/16/internet-2012-in-numbers/). This makes that thousand-year-old manuscript more valuable as a relatively rare source of information on the time when it was produced. Also, given its age and survival this manuscript is almost certainly a more influential text than most books of more recent vintage, let alone your average web page.

But even among historical sources, there are major differences in both accessibility and importance. Paradoxically, it is not the oldest sources that are hardest to access, but the old-ish sources. We might summarize the situation somewhat like this:

Newest sources (21st century) are mostly native digital.

Newish sources (some 18-19th and most 20th century) are well-printed and relatively easily converted to digital, but many post-1930s texts pose intellectual property rights issues.

Oldish sources (most 14th to 18th century) are less well printed or manuscript text and hard to convert to digital, but some have been transcribed.

Oldest sources (most pre-14th century) are are manuscript or metal/stone inscriptions and very hard to convert to digital, but many have been transcribed

In other words, there is a “late modern gap” that makes modern, pre-digital sources hard to access and manipulate digitally due to IP rights issues. Many of these are widely available in library collections, but many are not. In any case, the total volume of texts produced in this period was very large so there is a substantial “great unread.”

There is also an “early modern gap” that makes manuscripts and early printed sources (especially woodblock prints) hard to access and manipulate digitally due to limited availability and technological issues in converting them via OCR. Some core texts are widely available in digital formats or at least typeset formats that can be easily OCRed, but the relatively large volume of early modern texts means that many others are in the “great unread.”

By contrast, many texts predating the early modern period (i.e. before about the 14th or 15th century), and almost all early medieval or classical texts (i.e. before about the 7th or 8th centuries) are widely transcribed and digitized. The relatively small total volume of texts means that a fairly large percentage are available digitally. Despite the difficulties posed in reading these texts in the original, most have been studied — and digitized — to the point that many are fairly accessible, even to non-specialists.

The so-called “digital humanities” has addressed these gaps largely through digitization and curation projects. These projects are highly valuable — indeed we owe the substantial and growing corpora of contemporary and historical texts to the last two decades of largely unsung effort by librarians and curators. But large-scale, centralized digitization projects are beginning to reach diminishing marginal returns — the texts that remain are great in number but increasingly difficult to digitize and with decreasing scholarly value, at least on an individual basis.

Resolution 3: Distribution of traditional humanist labor as well as digitization efforts

There are two substantial bodies of information that are hard to access. The “late modern” gap is largely a legal problem. The “early modern” gap is partially a technological problem. But both — and especially the “early modern” gap — are largely areas that are well addressed by traditional humanist skills: finding texts; reading them; summarizing, translating and explaining them. It is still worth continuing the work of digitizing and curating these texts, but this is a project to which centralized bodies are poorly adapted. Instead, the work of reading, sharing, and yes digitizing these increasingly specialized texts should fall to those with the necessary specialization. Technologists should — and do — work to develop simple ways for individual researchers to process materials for digitization. Humanists should — and for the most part do not — work to share not only their conclusions, but their materials and methods as well. Opening up the process and sources of research will not only enable other specialists to make use of the same content for different purposes, but also give more information to technologists enabling better development.

Synthesis: A place for networks of traditional and less traditional forms of humanism

There is still a place for traditional humanist expertise in the 21st century world, but the forms taken by humanist inquiry must change. It cannot be a pursuit confined to individual researchers toiling away behind towers of books, or of specialized lectures in ivory towers. Yet humanist expertise is, still, largely individual. My solution is this: sharing in-progress work, including sources, open to commentary and revision. Everyone a researcher, everyone a curator, everyone an author, everyone a commenter.

This understanding has led me to begin the work myself, by establishing this collection as a platform to share my own work, and hopefully the work of other like-minded people. Medium is not the perfect — um — medium for this project — I would especially like to see better ways to format text to include translations — but it is quite a good start. It is certainly better than many of the alternatives. I am especially drawn by the ease of getting and giving notes — public and private, before and after publication — which seems perfectly designed for promoting ongoing criticism and commentary, and diminishing the distinction between works published and works-in-progress.

I am not totally committed to Medium: I admire several other sites of various sizes and flavors. I tend to think that the commenting and editing features work better than most older blogging media, and the end result is both cleaner and easier to produce. On the other hand, there is a great deal less customizability, at least that I have found. But in any case, rather than waiting for the perfect tool to come along I propose to use one ready out of the box. Please comment, please question, and please share your own work!

--

--