Punctuation in novels
Adam J Calhoun
2.7K205

Punctuation in some more novels.

I, too, was inspired by that series of posters and started a project but then set it aside. Adam did a great job moving forward with “heatmaps” and I thought I’d throw my hat in the ring.

Grabbing texts from Project Gutenberg, I stripped away everything that wasn’t one of

. , ‘ “ : ; ! ? — _ ( ) * & [ ]

And made some “heatmaps” with a similar color scheme:


I noticed someone’s response was curious about comparing novels across authors, so here goes Mark Twain:

The Adventures of Tom Sawyer
The Adventures of Huckleberry Finn
The Prince and The Pauper
Life on the Mississippi
A Connecticut Yankee in King Arthur’s Court

I don’t see much similarity on first glance, other than that shared red line in Huck Finn and CT Yankee.


Here’s Charles Dickens:

A Tale of Two Cities
Nicholas Nickleby
David Copperfield
Great Expectations
Oliver Twist

Don’t know that I’d say there’s a lot of similarity. Nicholas Nickelby looks like it has a bunch of really long sentences though!

Some more analysis is called for, obvs. Proportion of colons/semicolons in all punctuation for each author? Average sentence length? Definitely there are more dimensions to examine.


A brief look at the frequency charts for Mark Twain:

Tom Sawyer
Huck Finn
Prince/Pauper
CT Yankee
Life on MS

Almost definitely needs to be compared to other authors.


My code is available on Github