How to Read Books with Computers — and Teach Kids to Do It Too

For two years I’ve taught a digital humanities elective for high-school seniors. The subject is somewhat contentious. (When one colleague agreed to teach a second section, another colleague looked stricken and said, “Et tu?”) Surely it’s in part because of the title: Distant Reading. English is, after all, a discipline that prizes close reading.

Computers and books. The technocracy has risen, indeed, and English classes — that last bastion against the stampede of technology into high-school classrooms, that place where a student and their peers can work and learn unimpeded by pings and bloops and animations — English classes, too, are fallen.

Who needs to read a book anymore, when a computer can read it for you? A computer can read a barcode, after all, or a file. Why not a book? It would take seconds.

What? Distant Reading?

In fact, it does only take seconds for a computer to read a book. But of course, this is only distant reading, not close reading.

The course we have taught for two years now at Deerfield Academy takes its name, Distant Reading, from the title of Franco Moretti’s treatise that helped define a new branch of literary criticism.

Distant reading is the literary corollary of the big data movement. This kind of reading doesn’t need to be done by computers — it has been going on for centuries — but technology sure makes things a heck of a lot easier. Rather than deeply explicating focused passages, distant reading instead synthesizes massive amounts of text. A critical argument for distant reading is that studying huge quantities of texts — novels, letters, articles, anything with words — enables scholars to more comprehensively and (therefore) accurately identify trends and patterns, exceptions, anomalies, and other features of literary collections. Shouldn’t a study of the novel include all novels? Couldn’t an analysis of Shakespeare benefit from the synthesis of every word in every play he wrote? Moretti calls it a “more rational literary history.”

But while it may speak to us about patterns and exceptions, distant reading surely does o’erleap the very foundations of what reading was meant to be. The act of reading — which was, first and always, close reading — and of inhabiting another’s thoughts and voice is what critic and provocateur William Deresiewicz calls “the slow accumulation of the soul.” He describes it particularly in the context of reading a novel, but it is true of shorter works too. Careful reading means understanding every word and its range of meanings, informed by its context.

In the end, Deresiewicz is right. Reading is an intimate, personal act. But Moretti is right too. Distant reading enables us to see cold and quantitative fact in the word. The word: an otherwise innately symbolic, subjective and human invention.

“Distant Reading” put a name on a new approach to looking at literature.

So Why Teach Distant Reading in High School?

You’ve heard all the reasons before: in an increasingly technological world, blah blah blah… Jobs of the future will require blah blah blah… Aren’t kids digital natives, blah blah blah? Yes, yes, sure.

These are tropes and rationales, reasons and policies, and they’re meaningful — but climate and context don’t sell well to kids. Telling kids that they should do something because of cultural trends is a sure way to drain the energy out of them.

Why teach distant reading? Why teach digital humanities? Because once you can computationally engage the word, then, suddenly, you can do really interesting things!

When I was in college in the late 90s, while writing an essay on Heart of Darkness, I spent hours scouring the text for every use of the word “darkness.” Then I realized, in those early days of the internet, I could find the full text online and just hit “find in page.” Boom! There are all 26! Now I can look closely at every single one. That’s interesting!

And distant reading applies even beyond novels: seven years ago, using word frequency lists and spreadsheets, I scanned 14 million words of teacher comments to find out what character traits teachers reflected in students who excelled and encouraged in students who struggled. That’s interesting!

And it’s also applicable elsewhere: artist Jonathan Harris wrote some scripts that trawled thousands of blogs on the web, searching for the words “I feel” and cataloging the words that followed. Repeated multiple times every day for years, covering thousands of blogs, the resulting aggregated data — now on a website called We Feel Fine — offers an hour-by-hour representation of the mood of the world. That’s interesting!

Kids love interesting things, and distant reading shows them new, interesting things.

And yet, in all these examples, after identifying passages via distant reading techniques, close reading those passages is where the nuanced meaning-making happens. This is a model one increasingly hears to describe the digital humanities: it’s the telescope that helps find where to point the microscope.

In many ways, my close-reading colleagues are right: this kind of data doesn’t help us get to know each other or ourselves with much depth. But it does offer access to rich veins of meaning that, with careful subsequent attention, can help us see patterns and understand fundamental principles, or can introduce us to new places and passages where we might read more closely.

Results from an analysis of 14 million words of end-of-term teacher comments.

Doesn’t This Require Complicated Programming Skills?

Not anymore.

Once upon a time, programming was complex, full of convoluted syntax and arcane commands. Now, if you want to make a word cloud of a text, you can find the text online, copy it into a file like a Word or text document, import it into a program like Wolfram Mathematica using the Import function and type something as simple as:

And poof! You get a word cloud. And if you want to see how long a text is, you type:

Poof! And if you want to see how many times each word appears in a text, you can type (brace yourself here):

Poof! It’s like magic. No “for loops,” no “recursive algorithms,” no “sorting functions.” Just, in many cases, a single-line command, one that does what it sounds like it should do.

In short, coding doesn’t look much like coding anymore.

A word cloud of the lyrics from the complete Beatles discography.

Isn’t That Oversimplifying It a Little?

Not really.

Two-thirds of the students we have taught so far had zero coding experience, and you’ll read in a moment about the kinds of things they learned to do. Yes, the programming does get more sophisticated and complex, but that doesn’t mean it gets complicated.

Here’s an intimidating thing about computer science: we sometimes teach it by introducing all kinds of jargon and processes. “Let’s use a Java virtual machine and introduce arrays before we practice using pseudocode to avoid having to debug programs.” Whoa.

Or we teach classifications — floats, integers, strings, lists, arrays — without understanding how or why we use them.

But the mind is drawn first to action, not abstraction. We glaze over when introduced to ideas that we have no grounding in.

Show first. Do next. Describe later.

We use an apprenticeship approach, not a taxonomic or theoretical approach. We start asking students simply to copy and tweak code, and then they learn about what they just did and why it worked (or didn’t).

Network graphs from the first five books of the New Testament (KJV).

Immersion Computer Science

It’s like a language immersion classroom, but with a programming language.

In a language immersion class, you show up the first day, and instead of being told, “Here’s a basic conversation sequence and a verb conjugation, and here’s what they mean in English,” the language teacher just starts talking to you in another language. You imitate what you hear, drawing meaning from how you interact with the words and the sounds and the person, and gradually you discern the rules, which are sometimes explicitly explained after you’ve used them.

The same thing can be done with programming.

We start by picking a text and modeling several pages of analysis, typing code and talking through what we’re doing. Then for homework, we give kids a PDF of what we did in class. They do the same thing at home, but they choose their own texts — ones they’re interested in. They imitate the commands, but they get different results. They learn new things about an area of their own interest on day one of coding. They wield great power, even if they don’t yet know why it works.

Or like apprentice blacksmiths at work in a forge, they watch how a master handles the hammer and anvil, and then try it themselves, imitating what they’ve seen. Immersed in the setting, they will learn later about temperatures and the properties of metals, about how to sharpen their edges and about cooling techniques. And in a few weeks, they’ll be able to improvise a little. But at first, it’s all rote.

And when they do start improvising with their programming and running into problems — when they try to put a string into a function designed for lists — that’s when they learn about classifications and taxonomies. In that moment, when learning about classifications solves a problem for them, they care about it much more and they learn it well.

Phoneme distribution from a selection of songs by Kendrick Lamar.

What Can You Really Do in High-School Digital Humanities, Anyway?

Quite a bit, it turns out.

Questions asked and explored by high-school seniors include the following:

  • Which rap artists have the greatest use of internal rhyme?
  • Who are the central figures of social networks in each book of the Old and New Testaments?
  • What have been the most significant changes to Donald Trump’s Wikipedia page since he announced his candidacy for president?
  • Which civil rights speakers focus more on themselves, on other people or on the collective?
  • How have lyrics evolved over the artistic lives of the Beatles and Kanye West?
  • In what ways has the New York Times’s coverage of Harvey Weinstein changed since the emergence of the #MeToo movement?
  • How do home-city newspapers write differently about their sports teams compared to rival cities?
  • How has the school newspaper evolved under different editors, and how has it represented male and female voices on campus?
  • What are the common threads and evolutions throughout the history of the American presidential inaugural and farewell addresses?

How did they perform these analyses? Here’s an abbreviation of their methodologies:

  • Import rap lyrics, convert words to phonemes, count the number and locations of individual phonemes, discuss.
  • Import the full text of the Bible, pull out all the names in the order in which they appear, use name adjacency to represent a relationship, make a social network graph of the relationships, discuss.
  • Pull from Wikipedia the full text of Donald Trump’s page on significant dates (after announcing his bid, after nomination, after election, etc.), identify all words that were added or removed at each interval, count and sort, discuss.
  • Import speeches by civil rights leaders, calculate the ratio of first-person singular pronouns (“I”) to first-person plural pronouns (“we”) to third-person singular and plural pronouns (“he/she/they”), discuss.
  • Import the full lyrics of every Beatles or Kanye West album, identify words that are unique to each album as compared to all other albums, identify words that exist in all albums, discuss.
  • Import the full text of every article under the Times Topic “Harvey Weinstein,” calculate the frequency of articles over time, identify commonly used words before the #MeToo movement and after, discuss.
  • Import articles about significant games from across a range of newspapers, compare word frequency in hometown newspapers against word frequencies in other newspapers, discuss.
  • Import each article in the campus newspaper archives, calculate and average the length of every article, calculate the ratio of “Mr.” to “Mrs.” and “Ms.” and “Dr.” to understand how each issue represents male and female adults on campus, discuss.
  • Import the full texts of all available inauguration and farewell addresses by US presidents, identify common and unique words across the history of the addresses, discuss.

These are some projects pursued by students after several weeks of warming up. We begin with an analysis of literature and of the students’ own writing — they compare their sentences, vocabulary and paragraphs to great writers — and then we let them loose to ask their own questions and identify their own texts.

Community graph plots from the full text of two view books for our school (2010 and 2017).

In the End, It’s about Ways of Thinking

Through it all, we’re not really focused on learning programming, actually.

Programming is the medium, not the message. (I know, I know. Wait for it…)

Instead, we’re focused on skills and character traits.

Helpfully, succeeding at programming requires more than just learning syntax and commands. It requires asking good questions, breaking down problems and making cases based on evidence.

Question formulation, problem decomposition and argumentation: these are the real goals.

So, yes, succeeding at the medium does teach the message. (Marshall McLuhan was right, after all.)

And in developing these skills in a project-based curriculum that offers autonomy and choice, we aim to further foster the character traits of curiosity through unstructured play, discipline through structured experimentation and persistence through problem solving.

Those are the deeper goals.

And it takes closely reading student performance and engagement to know whether we succeed at all.

Editorial note: To see the author put these distant reading techniques into practice, you can watch his SXSW EDU 2018 presentation on “High School Digital Humanities: Live-Coding, a Course, and Next Steps” here.

About the blogger:

Peter Nilsson

Peter Nilsson teaches English (digital and print) and is the Director of Research, Innovation, and Outreach at Deerfield Academy. If you’re interested in innovation in education, subscribe to his newsletter, The Educator’s Notebook, a weekly email that collects education- and learning-related news from around the web for the purpose of promoting innovation in education. Peter also founded and directs Athena, a nonprofit platform for teachers to share practices. He is on sabbatical for the 2018–2019 school year to grow Athena. Find out more at