Enneagram Personality Classification on Text

Fernn
Analytics Vidhya
Published in
8 min readJan 4, 2021

Hi~ It’s Fern here! and this is my first writing on NLP. It is basically brief summary of my Master thesis.

Overall: So, what was my goal? In simple word, I was trying to ‘classify the user’s personality’ based on what they talked/wrote in the social media.

If you are not familiar with this topic, I do not want to scare you off yet. Only thing you need to understand here is that I want to tell which personality he/she belongs to based on the statisctic of their writing.. that’s it! simple right? :D

For people who are familiar with personality classification topics, you probably have seen a lot of Big Five (A Lot!) and MBTI (increasingly popular) theory and wonder why I chose Enneagram? First and honest reason, it is my pure interest *laugh* (hoping my advisor is not reading this *dry smile*). Secondly, to the best of my knowledge, no one has done text mining with this personality theory (if you find one, please please please let me know!). One of the reasons could be because of the data availability and another is that Enneagram is much more complicated compared to other 2 theories.

A little bit about Enneagram, the theory talks about 9 different personality types which were grouped into 3 core focus areas of an individual called “Centers of Intelligence”. Due to the issue of unbalanced data set, I decided to focus on the 3 Centers instead of the 9 types. The Centers are composed of Thinking center, Feeling center and Instinct center.

After finishing the experiment, I can say that it is much more challenging to directly identify Enneagram based solely on what people express in writing. But don’t be discouraged yet since life is about the journey, not the destination XD. There are interesting things we found along the way of our experiment and if you are ready.. Let’s begin our journey!

Here is how journey’s agenda

  1. Data set and Pre-processing
  2. Feature Extractions
  3. My experiments

Data set and Pre-processing

Data set: I crawled the data from 2 personality community websites which the users publicly provide their Enneagram personality. So, there are 2 main data I worked on 1) user’s enneagram type (as our Label) and 2) user’s posts/comments in the website forum (as our Training data). Below is the brief statistic of our data set.

Number of Post per Center per Website
Example of data set

For Pre-processing steps, I had been through many trials and errors and finally ended up with all the listed process. There were things I did, ex. remove the punctuations or lemmatize the words, but the results were worse, so I decided to keep it that way. I started with general data cleaning including the removal of newline, tabs and reducing multiple spaces to single space and continued specific steps below.

  1. Generalize quotations, user mentions, usernames, images, hashtags and url link. Give them a single tag name.
  2. Group emoticons into 12 categories
  3. Group personality-related references (such as INFP, 9w1, sp/so, Te, Si). Here mostly we have MBTI, Enneagram, Enneagram sub-types, tritype and wings and a bit more of Jung’s cognitive functions and Freud’s Id, Ego, Super Ego.
  4. Removal of other distinctive noises/ patterns that create bias for identifying the user and personality.

This is how the data looks like before cleaning.

@XgraceRA 7s are better in bed than 2s ?? Seriously? XD

And after cleaning

MENTION PERSONALITY_REF are better in bed than PERSONALITY_REF ?? Seriously? LAUGH_EMOTICON

My experiment belongs in supervise learning which we need the ‘label’ of the data. I converted the ‘user’s personality type’ into ‘center of intellegence’ based on their main number listed from their profile regardless of their wings, sub-types and tritypes. 8, 9, 1 are Instinct Center (aka Gut), 2, 3, 4 are Feeling Center (aka Heart) and 5, 6, 7 are Thinking Center (aka Head). So, we will get something like..

After cleaning, I did a bit of data exploration of each Center. Here is the word clouds for each.

Thinking, Feeling and Instinct Center wordclouds

Feature Extractions

There are 5 main features contributing to the classification including

  1. Bag of words (TF-IDF) — basic feature for anyone working with text mining. More information here I think his article is very easy to understand.
  2. LIWC — Linguistic Inquiry and Word Count contains a group of built-in dictionaries. There were more than ten thousand words grouped into 2 major categories; Linguistic Process and Psychological Process. LIWC main website can be found here.
Example of LIWC

We can see that the exmaple word ‘haha’ falls in to 3 categories: Affection, Positive Emotion (aka posemo) and Assent (basically means Agreeing words). What we have to be careful here is the sub-category, in this case posemo is sub-category of affect. This could create the bias/ input duplication, so when you deal with the input selection, depending on your purpose of experiment, be aware of that.

3. POS Tag — Part-of-speech Tagging is another well-known feature for text mining as it provides fundamental linguistic patterns and a basic form of language. I used NLTK library which provides totally 37 pos tags in total. Learn more about this feature here.

Example for POS Tag

4. Emoticons — I referenced the emoticon patterns from wikipedia and manually explored the data set, then added what I found.

5. General statistics — This is also manually done by exploting the meta statistic per single post including 1) word count 2) sentence count 3) newline count 4) quote count 5) mention count and 6) personality reference count.

The number of features are 10,000, 64, 37, 12, and 6 accordingly. I also had another feature, Combined Statistic aka ‘stat_all’, which basically is the combination of all features except TF-IDF. Its number of features is 119.

Finally, all of features were converted into the numerical representation, then we were ready to rock!

Experiments

I both jointly and separately utilized data from each website and also jointly and separately applied each feature. For simple explanation, I will briefly talk about the condition for each experiment and follow by the results. There are 4 machine learning models used here including Logistic Regresison (LR), Multinomial Naive Bayes (MNB), Support Vector Machine (SVM) and eXtreme Gradient Boosting (XGBoost).

1. Multi-class Classification

Let’s start with the basic one. Since we have 3 different classes/labels (Instict, Feeling and Thinking), I started with Multi-class Classification. Both data set were combined then I split the Training and Testing data in 85:15 proportion. Here is our result example.

KFold from sklearn was used for the validation score.

I used seaborn for our instant lovely chart to show the overall performance for our multiclass experiment. As expected, TF-IDF out performed the rest of the features. Number of the features do affect the result in this case. We can see that the accuracy is descending parallel with the number of features from TF-IDF (10,000) > Combined Statistic (119)> LIWC (64) /POS Tag (37)> Emoticon (12) /Stat (6).

Test Set Accuracy

2. Binary Classification

Binary Classification was to separate each Center out from other two. We had the similar features and ML models and I also coupled up some features with the Emoticon (2nd figures), but the result is not very distinctive.

Thinking center

Normal features
Combination of main features with emoticon feature

Feeling center

Combination of main features with emoticon feature

Instinct center

Combination of main features with emoticon feature

3. Binary Classification: Cross Website

Unlike the two experiments above, I separate the data usage. Personality Cafe data was used as Training data and Typology Central data was used as Testing data. By separating different website for training and testing set, we can test the generalisability and robustness of our training data. Personally, I think this experiment is the most interesting as we can see in the result below.

The accuracy of testing data is stable across all models, features and each Center. In contrast with above experiment, TF-IDF did not outperform other features and actually is slightly worse especially in LR model.

Actually, these are not everything I did but the rest are more detailed that I think it’s better to keep it thisway. To conclude what is shown above, I separate the big sections into

  1. Data set and Pre-processing which I showed you some of the cleaning and processing techniques I used
  2. Feature Extractions including BOW(TF-IDF), LIWC, Pos Tag, Emoticon and general statistics
  3. My experiments which are Multi-Classification, normal Binary Classification and cross-website Binary Classification.

To be honest, I think the result of my experiment did not reach my satisfactory level. One of the complications is that personality cannot be solely identified by writing style when the data set and the environment when people were posting their comment is not controlled. How they write can depends on their native language, current state of emotions, how well they know english and many more. However, I’m very happy that I at least started this to explore more and get to share this experience with you. Comments, questions, suggestions are very welcomed! I would love to discuss more on other possibilities!

--

--

Fernn
Analytics Vidhya

A wanna-be writer with Ed-tech and NLP background who keeps herself occupied with the curiosity of spiritual universe