Who is Holden Caulfield? A Natural Language Processing Approach

Recently, I was introduced to the IBM Bluemix Platform through a Natural Language Processing (NLP) course. I received a trial student subscription to ‘play around with it’ *cough* and use for the assignment I’m currently procrastinating on*cough*. Anyhow, I stumbled on a ‘Personality Insights’ (PI) service and began fiddling around with the demo. The service uses NLP, with tokenization and LIWC algorithms, to present insights into someone’s personality given text they wrote/spoke. Preferably the text should be personal; a blog entry, a reflective essay, etc. After trying out the usual interesting candidates (Figure 1.0), I had a thought: How would a psycho-linguistics, LIWC-based, NLP model — trained with modern day blogs, posts, emails, and the infamous Twitter — fare against the symbolic Holden Caulfield?

Figure 1.0: Bluemix Personality Insights Service on Donald Trump’s Iowa Freedom Summit Speech. Don’t take my word for it, try it out yourself.


While it was exceedingly tempting to do a chapter by chapter analysis and re-read the entirety of Salinger’s masterpiece along the way, I couldn’t forget those 2 assignments entirely. Therefore, as a preliminary step, using one to two examples for each, I’ll go over 3 approaches I took to explore and relate NLP textual analysis with human literary analysis:

  1. Find large shifts in the NLP model across chapters and consider them as prompts for a deeper literary analysis into those specific parts of the story. Here, I compared and contrasted NLP insights with a human reading of the text.
  2. The opposite of (1); find large shifts in the character/story through a human reading and observe how they match up with our algorithmically determined insights.
  3. Consider overall trends in personality over time seen through NLP analysis and compare and contrast them with a human understanding of the character.

I will assume in my writing that you have read J. D. Salinger’s The Catcher in the Rye and know the overall story. That being said, the analysis should make sense even if you haven’t, although you may not get some tiny references here and there. I also try to present relatively comprehensive summaries before delving into any example or chapter so you stay in the loop.

I would like to stress here that this is not a rigorous scientific procedure but neither is it entirely hand-wavy. My engineering background does invoke some technical lingo and attitudes but it shouldn’t exclude any non-technical backgrounds from appreciating the overall analysis.

The Big Five Personality Model

Before I go further, it is important to understand the overview structure of the Personality Model we are working with. You can read more about the science and justification of it on the Bluemix’s service page, but for our purposes, all we care about for now is the structure. Personally, trying to understand this model through words, I realized a picture can do a lot better:

In Picture:

Figure 1.1: The Big Five Model Structure. The algorithm selects the most prominent dimension among the ‘Big Five’ as a Primary Personality (Purple Box) and labels the rest as Secondary Personalities (4 out of 5 Grey Boxes). Each dimension is described further by 6 unique traits (Green Boxes). (Created with Mindmup)

In Words:
The PI service’s algorithm selects a primary personality — an extreme high or low of one of the ‘Big Five’ dimensions — but gives the percentages of all the other dimensions as well, labeling them as secondary personalities. The PI-Dimensions-Characteristic Table — note that Neuroticism is also referred to as ‘Emotional Range’ — shows how to interpret a personality using the primary-secondary high-low value combinations. Within each of the Big Five (O.C.E.A.N.) dimensions, there are 6 minor ‘traits’ that describe each dimension in more detail. Each of the traits is a choice between two attitudes (a ‘forced pairing’) — you can see the exact descriptions of each trait in the PI-Facets-Characteristic Table. As for structure, that’s pretty much it. I guess it isn’t too bad with words but in my defense, I was trying to figure it out through JSON structure, so yeah, my fault. For each dimension and trait, the algorithm outputs a decimal between 0 and 1 to describe the personality along a spectrum.

Process, Method, and Data

Just to give an idea of what I’m talking about when I say ‘analysis’, I thought I’d provide a high level description of my process before presenting the extracted data I will be working with.

I created python scripts (code can be requested) to divide the novel into chapters and used cURL requests to communicate with Bluemix’s PI service. The service recommends to use 3500+ words for a ‘decent’ analysis, so I combined some chapters before performing any analysis on them, except for Chapter 26. Chapter 26 is a little different because Holden has just completed telling his story by the end of Chapter 25 and in Chapter 26, the setting changes from past to present. Therefore, I decided to exclude Chapter 26 generally (only 200–300 words) but still included it in the technical data… for fun. As far as the NLP side of exploration goes, however, Chapter 26 really doesn’t count and I’ll consider Chapter 25 as the end of the story for any overall trend analysis.

Upon receiving the text, the PI service responds with a JSON formatted string containing an instance of the Big Five personality model. From the string, I extracted the personality data to plot the trait/dimension percentages across the chapters. Figure 2.0 depicts the ‘Primary Personality’ over chapters. Figures 2.1–2.5 show the percentages of each of the Big Five dimensions (the dashed lines) as secondary personalities and their traits (solid lines). Finally, I went back and forth between reading The Catcher in the Rye and SparkNotes, drinking coffee, and getting wistfully lost in Holden’s angst and anxiety.

Figure 2.0: Primary Personality across chapters. The letters on the top at each tick indicate one of the Big Five dimensions including Agreeableness (A), Conscientiousness (C), and Neuroticism (N) (also called Emotional Range).
Figure 2.1: Agreeableness and Respective Traits
Figure 2.2: Conscientiousness and Respective Traits
Figure 2.3: Extraversion and Respective Traits
Figure 2.4: Neuroticism (also called Emotional Range) and Respective Traits
Figure 2.5: Openness and Respective Traits


Approach One (1): NLP Insights as Prompts for Human Analysis

NLP Insights

There are a few things that visibly stand out in the above figures, but the first to attract my eye was the change of Primary Personality (PP) and the steep drop of Trust, Sympathy, and Altruism in Chapters 6–7 (Figure 2.1). Correlating that with other figures, we can collectively observe the following for Chapters 6 and 7: (all percentages are approximates)

  • 40% drop in Trust, Sympathy, and Altruism (Figure 2.1)
  • 20% drop in Morality, and Cooperation (Figure 2.1)
  • A change in PP from an extreme high-Agreeableness (A) to an extreme low-Conscientiousness (C) (Figure 2.0)
  • 20% drop in Self-Efficacy (Figure 2.2)
  • 10% drop in Dutifulness and Cautiousness (Figure 2.2)
  • 30% rise in Excitement-Seeking (Figure 2.3)
  • 10%-20% rise in all minor traits of Neuroticism (Anger, Depression, etc.) except Self-Consciousness (Figure 2.4)

Objectively, these changes seem to intuitively make sense. A rise in Anger, Depression, and excitement decrease some of the more orderly traits like Cautiousness, Cooperation, and Self-Efficacy. We also see a shift of concern to one’s self over the concern for others illustrated by the decline of Sympathy, Trust, Altruism, and Morality. Let us now look at what we can infer directly from reading the book.

Human Insights

First, let’s look at the actions that take place and give a short summary of what exactly is happening in Chapters 6 and 7. As a background, Chapter 5 ends with Holden introspecting, recalling deeply emotional events like Allie’s death — Holden’s brother died of Leukemia — , and ultimately just staring out the window. He also finishes a writing assignment for Stradlater — who is out on a date with Jane, Holden’s childhood-ish ex-girlfriend-ish…friend— but not really, since he didn’t exactly follow the ‘rubric’ (one of his many charms) and just sort of wrote what he felt like (another charming feature). Then, come Chapters 6 & 7, the following takes place:

  • Stradlater, coming back from his date, is annoyed by Holden for not doing the assignment properly
  • Holden, inexplicably hurt since he wrote about Allie, tears up the assignment and throws it away
  • Holden smokes, there is tension in the room
  • Holden asks Stradlater about his date with Jane but doesn’t get an appropriate response
  • Holden attacks Stradlater, upset from a variety of things, but Stradlater pins him down and punches him, bloodying his face
  • Holden is angry, goes to Ackley’s room to talk to him, but then tries to sleep
  • Holden can’t sleep, restless, keeps thinking about Jane
  • Ackley annoyed by Holden and Holden annoyed by Ackley
  • Holden suddenly decides to just screw it all and leave the school earlier than he had originally planned
  • He is ‘sort of crying’ while leaving the school but he doesn’t know why
  • He yells out ‘Sleep tight, ya morons!’ and ‘gets the hell out’ of the school

Needless to say, things happened! Personally, this is one of my favorite ‘scenes’ of the book — it also contains my favorite quote from Holden. It’s filled with little hints for the reader to discover in order to reach into the secret compartments of Holden’s character. While on the surface, it seems like it’s all chaos, there’s an underlying order to the actions in these chapters. Many people read this section only to conclude, “Man, Holden is whiny, impulsive, and pointlessly difficult. I can’t stand this guy. Who does he think he is?” (Yes, it’s those same people that were probably supporting Stephen Harper). When you look a little deeper, you actually start to feel sorry for Holden. He’s acutely sensitive, despite his suppressive story-telling (Figure 2.5), and he’s been attacked on two deeply emotional fronts.

“Do you have the time / To listen to me whine / About nothing and everything /All at once?” — Green Day’s “Basket Case” lyrics have a strong connection to Holden.

First, Stradlater completely disregards and trashes Holden’s writing about Allie, the brother he lost to Leukemia and who he was just thinking about a while ago. Then, because apparently one douchebaggery wasn’t enough, Stradlater begins to taunt Holden about Jane. We can infer from the conversation that Stradlater may have been the complete opposite of a gentleman on his date with her. In Holden’s eyes, the attitude and words of Stradlater are disrespectful to Jane, who Holden has symbolized as a link to a nostalgic innocence he is unwilling to forget. To add on to it all, there’s an annoying Ackley next door, and let’s not forget Holden was just expelled from school. At this point, his humorous, defensive personality crashes and Holden recklessly attacks Stradlater, only to end up with a bloodied nose. But don’t be fooled by the seemingly lose-lose scenario, because in the end, trust me, Holden is the real winner.

“I told him he [Stradlater] didn’t ever care if a girl kept all her kings in the back row or not, and the reason he didn’t care was because he was a goddamn stupid moron.” — Holden Caulfield.
In my opinion, THE greatest comeback ever. Smart, concise, point proof analysis, Bam!


The NLP insights, interestingly, caught on to the rising action action quite well. Since the algorithm is more or less based on word counts, I would guess that all the ‘shut ups’ and talk about punching, blood, and ‘getting the hell out’ gave the reckless behavior away. Which is why, in the PI Dimensions Table, the low-Conscientiousness (C) as a PP and a high-Agreeableness (A) as the Secondary Personality results in an ‘unruly, reckless, devil-may-care’ attitude. Clearly, Holden’s actions also show a lack of cooperation, and his agreeable behavior, along with Trust, Sympathy, and Morality, drops to give way to Neuroticism or highly emotional activity. The matter of self-doubt, present in Holden’s questioning, crying, and overthinking (with Ackley and the Catholic Monastery) is also somewhat captured by the drop in Self-Efficacy. One thing I found especially impressive was the model’s ability to discern between high Excitement-Seeking — which it relates more to ‘risk-taking’, the opposite of boredom — and a high Activity-Level — which is more related to ‘fast-paced’ activities and multi-tasking. In this case, looking at Figure 2.4, it makes sense that Excitement-Seeking rises in Chapters 6–7 without an increase in Activity-Level since the reckless encounter with Stradlater is better described as ‘risky’ and exciting rather than ‘fast-paced’ and multitasking. Interestingly, we can observe Activity-Level rise later in Chapters 10–11 (Figure 2.4), correlating with Holden’s environment of a fast-paced nightclub, a hotel with alcohol and women, and extravagant socialization.

Overall, in this case, taking the NLP insights as prompts did lead us towards finding a significant change in character and plot through human literary analysis. The prominent actions and events seem to be captured especially well by the drop in Agreeableness (A) traits and rise of Neuroticism (N) traits (Figure 2.1, 2.4). The details, including the deeper sensitivity of Holden, are not completely recognizable through the word choice alone and involve a knowledge of context, which make it difficult for algorithms to spot them. While NLP analytics do not — for now — capture the profundity of keeping your kings in the back row, they seem to be valuable as prompts that can aid literary analysis and understanding the big picture of a character’s immediate personality.

Approach Two (2): Human Literary Analysis validated by NLP Insights

Human Insights

Unlike Approach (1), where I chose one of the more visible personality shifts, the example selection for this approach was, more or less, a personal choice. Chapter 24 is a relatively long chapter — the only chapter that has its own SparkNotes section — and there’s an interesting dialogue that takes place which I tend to jump to when I want to be reminded about the themes with little reading. In this chapter, we get to meet a character, Mr. Antolini, through which Salinger gives us a more orderly and structured insight into Holden’s mind. The discussion that takes place is definitely noteworthy and there are traces of a meaningful connection. For the most part, this chapter is a lot of talk — not at all boring though — around the internal pain and disillusionment of Holden and his life choices. Once again, let’s summarize the key points:

  • Holden is welcomed into Mr. Antolini’s house, his past English teacher, and senses Mr. Antolini has been drinking, describing him as a heavy drinker
  • They exchange greetings and talk about Holden flunking out of school
  • They discuss Holden’s English classes and his disdain for rules and regulations which digresses into a discussion about digression
  • Holden shares his views on digression through an anecdote, claiming that he preferred if discussions digressed rather than always ‘sticking to the point’
  • Mr. Antolini gets a sense — or is reminded of it at least — of an internal struggle of Holden and attempts to give him some sound advice about his life path, starting off by attacking his argument for digression
  • Mrs. Antolini comes in the room, gives coffee, and Mr. Antolini mentions Holden’s father and how concerned he is about Holden’s choices
  • Holden is getting uncomfortable about the discussion that is now becoming quite serious and revealing
  • Mr. Antolini gives a nice short speech with positive elements, trying to inspire, but also sternly warn, Holden about the future if he doesn’t change his attitude
  • Holden is listening but also shown to be tired and yawns unwillingly
  • They talk a little about girls, Sally and Jane, and Holden dating
  • Holden goes to sleep but wakes up in the middle of the night to Mr. Antolini beside him, ‘petting’ his head
  • Holden is… flabbergasted! He’s confused, uncomfortable, and rushes out of Mr. Antolini’s apartment thinking Mr. Antolini is making some creepy advances at him

Overall, the chapter is calm except for the last reaction. In terms of language and wording, there’s a lot of wise rhetorical discussion happening and the wording reflects that with mentions of knowledge, hate, intelligence, etc. Mr. Antolini also comes off as a sophisticated man and uses wording like “pedagogical” and “beautiful reciprocal arrangement”. Moreover , there’s a considerable sense of cooperation, respect, and genuine sympathy from the language like “‘I understand”, “Yes, I know”, or “I realize that”. Finally, in bits and pieces between dialogue, we see language describing an underlying exhaustion as Holden is narrating about ‘feeling lousy’, ‘tired’, ‘headaches’, ‘sleepy’, ‘exhausted’, etc.

Contextually, a lot of what Holden has taken for granted throughout the book is being challenged by Mr. Antolini, whose opinion actually matters to Holden. Generally speaking, Holden is able to push off a lot of opinions by rationalizing some sort of weakness or ‘phoniness’ in another’s character so he goes unchallenged. However, with Mr. Antolini, the manner of interaction is different and Holden, despite his efforts to rebuttal, begins to internalize and see weaknesses in his frame of mind. There’s also occasional mention of a ‘fall’ by Mr. Antolini that Holden can expect if he continues his ruffian ways. We see more self-awareness on the part of Holden, prompted by the discussions, but this also makes him uncomfortable, anxious, and a little edgy. Another interesting element to consider is Holden’s defensive reactions to some of the observations Mr. Antolini makes. Despite seeing a weakness in his own arguments, Holden tries to show Mr. Antolini at least some value in his desire for digression. He also tries to explain that he doesn’t ‘hate everyone’ but actually misses them at times. There’s also his heartbreaking response — yes, I mean that to say it literally breaks my heart— to Mr. Antolini’s pushy declaration:

Mr. Antolini: “Frankly, I don’t know what the hell to say to you, Holden.”
Holden: “I know. I’m very hard to talk to. I realize that.”

As a human reader, it shows that Holden’s external shell is breaking and he’s becoming less repressive, but also more vulnerable. Later on , the last two pages tell of a more rash and impulsive Holden who, although confused, isn’t really trying to think about the matter but only focused on escaping from the situation. Since this is a small part in the large chapter, I do not expect the NLP model to capture it well.

NLP Insights

For this approach, rather than exclusively concern ourselves with drops and rises, we can interpret the data a little more objectively. These are the relevant observations (in Chapter 24) from Figures 2.0–2.5:

  • The PP is a high-Agreeableness (A) which we see as a majority throughout the book (Figure 2.0)
  • 15–20% drop in trust and altruism bringing them down to a mid-range of 50% (Figure 2.1)
  • 15% increase in Sympathy bringing it to a second highest value of 85% (Figure 2.1)
  • An average level (relative to the entire novel) of 20% Cooperation (Figure 2.1)
  • A 5%-20% drop in all traits under Extraversion, the largest drop being in Activity-Level (Figure 2.3)
  • 10% rise in self-consciousness (Figure 2.4)
  • 10% drop in Vulnerability (Figure 2.4)
  • Openness is overall unchanged and relatively low (Figure 2.5)

The immediate data describes a closed off individual who is sympathetic to others but indifferent in terms of Trust. His Conscientiousness and Openness traits show an underlying self-doubt. Furthermore, drops in Extraversion hint to a change of lifestyle towards calm or slow-paced activity.

The shifts aren’t as noticeable as in Chapters 6–7, and I think a lack of action may be the cause of that. A lot of the conversation, as mentioned before, is contextual and very content based. This makes it difficult for a NLP algorithm based on word counting to notice the thematic elements in the Chapter. Nevertheless, there are changes — some matching with our human analysis, some contradicting — that are noteworthy to mention. Keep in mind that even a human interpretation, after all, contains bias and can vary person to person.


Principally, the PP stays as a high-Agreeableness, which in itself tells us that there isn’t a large shift in Holden’s character. While the reader is getting more information about Holden’s mindset in this chapter, it is reasonable to say that his personality overall hasn’t shifted greatly. We will discuss this majority PP later on in Approach (3). Secondly, while the drop in Altruism ultimately coincides with a human reading since the spotlight of this Chapter is Holden’s own life choices, the drop of Trust is debatable. I wouldn’t agree that Holden trusts Mr. Antolini only at a mid-range level relative to the other chapters. As a human reader, I would actually infer that Holden’s use of ‘Mr.’ and his interest in the discussion are signs of a deeper respect and trust of his old teacher. However, considering from a language and wording standpoint, due to the way Holden talks, there are a lot of ‘I don’t know’s and ‘Maybe’s that could cause the algorithms to assume Holden is wary of Mr. Antolini’s intentions. Personally, I see such wording as a defense mechanism against further introspection and not a matter of weak trust.

On the other hand, Sympathy, for the first time, has reached as high as 85%. A high-value of Sympathy indicates ‘You feel what others feel and are compassionate toward them’ so it is promising that the model moves towards this. While both Phoebe — Holden’s loving sister — and Mr. Antolini can be considered to genuinely care for Holden, Mr. Antolini expresses it to the reader through his own lengthy speeches and the reader does not have to rely on Holden’s non-dialogue narrations. Holden responds to this genuine care with defensive behavior but also an acknowledgement of his own difficult attitudes, although he feels somewhat justified for having them. In this sense, Holden empathizes with Mr. Antolini by recognizing both his teacher’s desire and struggle as an adviser. His subsequent sympathetic language expressing that empathy seems to be captured well by NLP insights.

An average level of cooperation fits well with the collective focus of Chapter 24 on dialogue and discussion.

The drop in Extraversion is most simply explained by introspective dialogue as well as the lack of action. For a majority of the chapter, as we previously discussed, the situation is calm and collected. Even the arguments between Mr. Antolini and Holden are not really in a debate format but more of a Socratic method of coming to deeper truths (ex. Mr. Antolini questions Holden about digression to put holes in his theory). Furthermore, a rise in self-consciousness — similar to self-awareness — is also captured by the model. As the definition of higher self-consciousness suggests (“You are sensitive about what others might be thinking of you”), Holden’s defensive arguments are indeed a display of sensitivity towards other’s and specifically his old English teacher’s opinions.

The drop in Vulnerability is a slight contradiction from our human analysis but this matter could be argued either way.

However, the increase in Openness, which is quite obvious to a human reader, does not seem to be sufficiently captured by NLP analysis. We will discuss the consistent low-value of Openness later, but for now it is enough to mention that even an algorithm wasn’t able to break through Holden’s outer shell.

Approach Three (3): Overall NLP Trends vs. Character Development

In this approach, instead of scoping down to specific chapters, we can take a more holistic look at the personality of Holden. While there are undoubtedly many aspects to discuss and critique, we will only focus on two notable and high level observations: the constant low-Openness (O) and the constant high-Agreeableness (A).

Constant Low-Openness (O)

Figure 2.5 shows the percentages of Openness (O) and we can generally accept that there haven’t been any significant changes throughout the chapters. Interestingly, all Openness traits are ranked low by the PI model except for Emotionality. Although it is cumbersome, it would be helpful for this section if we listed the specific definitions of the traits from the PI-Traits Table. However, rather than match Holden with the low definitions, we will take a different approach — because it will make the causal workings of the model clearer — and compare Holden with the high definitions, showing why he may not fit most of them.

Definitions of Openness traits:
High Adventurousness: You are eager to experience new things
High Artistic Interest: You enjoy beauty and seek out creative experiences
High Imagination: You have a wild imagination
High Intellect: You are open to and intrigued by new ideas and love to explore them
High Liberalism: You prefer to challenge authority and traditional values to effect change
High Emotionality: You are aware of your feelings and how to express them

These definitions are critical for our analysis because they confine the vague traits into attitudes we can compare a personality against. For some of us, our laziness will have reacted with scorn towards a fairly recognizable pattern in a majority of these definitions. They seem to revolve around some kind of ‘energetic’ attitude. Wordings like ‘seek out’, ‘effect change’, ‘eager to experience’, and ‘love to explore’ instill a certain peppiness in these high traits. Under these definitions, open people would most likely be applying themselves in school, and with Holden, that is certainly not the case. Most would also agree that Holden is not the liveliest of people and that fact alone, for the most part, could account for the low percentages in Openness. As for the trait of low Imagination, we can accept that although Holden is rash at times, he usually only refers to the very real and sometimes crude aspects of life. Emotionality, however, is looking for awareness and expression of feeling in general — not specifically concerned with a context of beauty, authority, or awe — which makes it a little different from the other traits. Truthfully, part of what makes this novel so entertaining is Holden’s closed but dramatic narration in which he lays out his anger and contempt of society in very clear terms. It is possible that the use of semi-hateful and attacking language by Holden is what prompts a high Emotionality. Lastly, since Openness is very closely related to the unique style of Holden’s story-telling, which is maintained throughout, a constant low-Openness seems to be an appropriate and reasonable depiction of the novel.

Regrettably, the resulting description of a High-Agreeableness (A) and a low-Openness (O) claims Holden is ‘Dependent, Simple’ and that may be pushing it a bit. There are some tactless elements present in Holden that are difficult to deny from a human literary reading. Though, in the defense of the model, a continuing challenge for NLP is recognizing sarcasm and lying, and those are the primary methods Holden employs for callousness — and he isn’t even British! However, this contradiction may have less to do with low-Openness (O) and more to do with high-Agreeableness (A), and this leads us to our next pattern.

Constant High-Agreeableness (A)

If it isn’t obvious already, we have avoided mentioning the explicit implications of high-Agreeableness (A) throughout our earlier analysis. Unfortunately, the definition PI assigns to high-Agreeableness (A) is pretty straightforward so it makes it all the more harder for us to weave out of this issue. ‘Getting along with others’ and ‘an optimistic view of human nature’ is the defining point of high-Agreeableness (A) and, obviously, a severe disagreement with Holden’s personality. Surprisingly, regarding Figure 2.1, the fluctuations in Trust, Sympathy, etc. paint a fairly accurate depiction of Holden’s personality throughout the chapters, making the unreasonably high Agreeableness (A) only more confusing. On its own, a high Agreeableness (A) may not cause too much damage to the credibility of this model, but this anomaly poses greater problems when selected as the PP for a majority of Salinger’s novel. With a high-Agreeableness (A) as a PP, the high-Extraversion (E) as a Secondary Personality is interpreted to mean ‘Happy, Friendly, and Merry’ which is downright insulting for Holden — an insult further propagated to his devotees. It is promising, however, that on the low-Agreeableness (A) side the description includes ‘valuing self-interest over others’ which is equally unbefitting. Overall, a 98% value for Agreeableness (A) just seems outrageously high for our character and even a trivial reduction to about 80% would be more bearable.

“Don’t know about the people,
but all the scarecrows
are crooked.” —Haiku by Kobayashi Issa

It is difficult to explain the cause for such a high Agreeableness without knowing the inner workings of LIWC. Maybe further research or a deeper study into the science behind LIWC could provide us with a better understanding. One thing to note is that the overall method of analysis relies on word counting and sentence structure, which is not always the best indication of context. One possibility that comes to mind is that the model is being misled by a repetition of certain bi-grams and figurative phrases. In the next section, we will see some of the conditions/assumptions behind a NLP textual analysis and how they relate with The Catcher in the Rye.

Final Discussion & Conclusion

To begin, I should mention that The Catcher in the Rye may have been the worst book for testing the PI service since recognizing sarcasm and lying require specifically trained NLP models. And perhaps for that same reason, it is the best book for testing the PI service. Nevertheless, it doesn’t really matter since the selection of this book was 90% due to personal interest and 10% because of its first-person narrative. I was originally debating between this and The Bell Jar (89.99% personal interest), but I ultimately settled on The Catcher in the Rye because its chapters were more easily divisible through Python (really, there’s no other reason). I could list more reasons for why this book is a good candidate for this sort of study but they would all be after the fact.

Through this work, I also re-realized just how differently Holden has been interpreted by various people — from a Hippie to a Bad Role Model — so I don’t expect a universal agreement on all I’ve talked about (and that’s okay!). This may be a reason why fiction readers might shy away from numerical analysis; fearing that cold, hard answers will triumph over those that are personal, experienced and opinion-based. However, it is important to remember that even algorithms — which humans create, by the way — possess bias, though defined differently than human bias, and struggle with similar uncertainty. They are not absolute, or even ‘true’ for that matter. Numerical analysis is simply a different lens through which we can see a novel or character. And if anyone should knows how to appreciate a new perspective, it’s a reader.

“…fiction is obliged to stick to possibilities” — Mark Twain


A lot of the final analysis was conclusive so there isn’t a whole lot left to do in a conclusion other than make things up and use big words to sound awesome. So, in this section, I just want to mention some interesting lessons and realizations I stumbled upon through this study. Some of what I mention could have been better off in the introduction but since I learned/realized it during my analysis, I feel mentioning it in a conclusion is more authentic (and less phony) — it’s what Holden would have wanted after all.

Lessons, Thoughts, and Realizations in No Specific Order (could be right or wrong):

  • Humans, although possessing a keen eye for general detail in literary analysis, tend to see characters in-the-moment which can lead us to missing a more holistic view of their personality. Computers don’t have that problem and we could use their aid to cover our weakness. It is not about delegating the thinking to machines, it’s about using machines to enhance our thinking.
  • We do not read to only understand, we also read to feel. That being said, an analysis of the text doesn’t seem to take away from the feeling but rather helps to express it.
  • Dr. Sarah Graham wrote, “Catcher was not intended for a teenaged audience… Salinger consistently writes about children and adolescents, but not for them.” Interesting…
  • An author must exhibit great control defining his character, simply analyzing them can lead to so many tangents. And here lies, I believe, Salinger’s genius. There’s an orderliness to Holden, he has limits, and that is what makes him so real and relatable (for me at least).
  • LIWC inherently assumes that we can gain insights into a person (cognitively, socially, psychologically, etc.) by noting their natural use and style of language. Therefore, some of the contradictions are understandable considering the adamant use of old language, slang, and colloquialism in The Catcher in the Rye which makes it difficult for NLP to get contextual meaning through LIWC analysis alone. Despite that, I found the model impressive.
  • Interestingly, LIWC began with researchers noticing that when people wrote about their intense emotions, there was evident improvement in their physical health.
  • Can we go a step further and analyse the personality of the writer through the personality of his characters? We’ve done that at times as humans where it made sense — Confessional literature like The Bell Jar, Steppenwolf — but what about through computers? I don’t know. Sounds dangerous.
  • Could we use computational linguistics to strengthen our characters and make them more appealing? Would this be inauthentic and close to phoniness? But could we not help writers get their ideas out? But wouldn’t that knowledge end up being used for ridiculously useless, unethical, and harmful entertainment since that’s what makes money? Can we actually model successful novels statistically?


I would have liked to mention Jane and Phoebe more. Maybe analyze the overall NLP model trends in the personality of Holden when he thinks about them or interacts with Phoebe. But the way the analysis was going, it seemed too artificial to force them in. Maybe some other day.