Drawing insights from any book with Text Mining
Text Mining and Natural Language Processing are two of the most interesting fields right now in Data Mining. Whether you are working with tweets from some controversial political candidate or going through books with hundreds of pages to discover some kind of pattern, there’s a lot you can do right now with the algorithms and packages available in R (or Python, for that matter).
For me, as a bookworm, one of the most interesting possibilities is to take one book and thoroughly analyze it, working mainly towards visualization and discovery of relevant information within the text. In this article, I’ll address the transformations and preparation of the data and show some meaningful relations between words I’ve discovered.
Let me know what you think of this post and any feedback would be important!
Getting Started
What you’ll need: book in PDF or TXT format
Programming language and IDE: R and the IDE of your choice
Packages we’re gonna use: tm, stopwords, tidytext, tidyverse, wordcloud2, ggplot2, and others.
The book I have chosen to analyze this time is Carrie, by Stephen King. While it’s actually not my favorite book, it was written by my favorite author of all time, Mr. Stephen King. The story of the…