Drawing insights from any book with Text Mining

Rafael Belokurows
Analytics Vidhya
Published in
8 min readDec 17, 2020

--

Text Mining and Natural Language Processing are two of the most interesting fields right now in Data Mining. Whether you are working with tweets from some controversial political candidate or going through books with hundreds of pages to discover some kind of pattern, there’s a lot you can do right now with the algorithms and packages available in R (or Python, for that matter).

For me, as a bookworm, one of the most interesting possibilities is to take one book and thoroughly analyze it, working mainly towards visualization and discovery of relevant information within the text. In this article, I’ll address the transformations and preparation of the data and show some meaningful relations between words I’ve discovered.

Let me know what you think of this post and any feedback would be important!

Getting Started

What you’ll need: book in PDF or TXT format

Programming language and IDE: R and the IDE of your choice

Packages we’re gonna use: tm, stopwords, tidytext, tidyverse, wordcloud2, ggplot2, and others.

The book I have chosen to analyze this time is Carrie, by Stephen King. While it’s actually not my favorite book, it was written by my favorite author of all time, Mr. Stephen King. The story of the…

--

--

Rafael Belokurows
Analytics Vidhya

Data Nerd, Compulsive Reader, Coffee-drinker, interested in pretty much all things IT