Methodology of ACUMAN’s psychometric algorithms

Apr 24, 2015 · 4 min read

Note: ACUMAN’s corpus data is available to download near the end of this article.

Note: There is no mention in this article of ACUMAN’s unique natural language processing algorithms used for the AI chatting element of it. This article is merely dedicated to explaining a bit more about the accuracy rates of its machine learning algorithms and the corpuses it uses for this purpose.

There are two different categories in which ACUMAN categorizes psychometric data, with various different critera. It creates a machine learning classifier from a corpus of data using a unique algorithm which allows for a higher accuracy rate under the circumstances of natural language dialogue. The machine learning and text classification algorithm I employ is my own derivative of Naive Bayes, with added tokenization, named entity recognition, and Laplace smoothing. I have plotted the accuracy rate of the true/false positives of each of these machine learning classifers on a ROC curve, with their AUC probability from it.

Here are the categories which ACUMAN’s text classifying and machine learning algorithms classify the chatting input from the user into:

Mood and Sentiment Polarity Classifier:



Personality Analysis Conducted with the Five-Factor Model (FFM):

Openness to experience

  • inventive/curious
  • consistent/cautious


  • efficient/organized
  • easy-going/careless


  • outgoing/energetic
  • solitary/reserved


  • friendly/compassionate
  • analytical/detached


  • sensitive/nervous
  • secure/confident

Mood and Sentiment Polarity Classifier

Image for post
Image for post

Since ACUMAN’s sentiment analysis is polar (either Negative or Positive), the algorithm can either classify positive correctly, or incorrectly. For 77.9% of the corpus phrases, the text classifying algorithm categorized the corpus data correctly. The dataset corpus that this text classification algorithm was trained with for use on ACUMAN was introduced by Pang/Lee, with 3,800 corpus phrases. 1/3 of them (1,280 corpus sentences) were excluded from the training set of the classifier and the ROC curve was calculated with the remaining corpus sentences from the training examples. This allowed the true/false outcomes to be known, and for the accuracy rate to be calculated by comparing the two outcomes.

This data illustrates that ACUMAN’s sentiment analysis machine learning algorithm has an accuracy rate of 77.9%, which is a 1.1% difference between the average human sentiment analysis detection accuracy rate to positive/negative texts, which is 79%.

The raw, unparsed, and unorganized survey data used partly as the corpus data during the training process for this algorithm, is available here in the form of a ZIP file.

Personality Analysis Conducted with the Five-Factor Model (FFM) Classifier

The corpus for the 5-factor psychometric machine learning algorithms were collected through a survey of 1,741 participants through the ACUMAN website. ACUMAN was able to build the 5-factor psychometric machine learning algorithm through this data. As in the Mood and Sentiment Polarity Analysis, 1/3 of the algorithm’s original training set corpus of the classifier was used in the ROC Curve. The raw, unparsed, and unorganized survey data used partly for the corpus data during the training process for this algorithm is available here in the form of a ZIP file.

Empirical research shows that the most effective psychometric personality model that measures different traits without overlapping is the five-factor quiz. It has shown consistency in various multi-cultural studies, including one with participants from over 50 countries, showing a remarkably universal nature of its personality criteria. Many psychologists believe these traits to be of biological origin, including psychologist David Buss, who has proposed an evolutionary explanation for these five core personality traits, stating that these personality traits represent the most important qualities that form our social landscape.

The results of both the Mood and Sentiment Polarity Analysis and the Personality Analysis Conducted with the Five-factor Model are a testimonial that machine learning and classification algorithms can reach comparable results to those of administered psychometric and psychoanalytic tests. The convenient and advantageous aspect of this approach is that it allows us to achieve these results through natural language interaction and conversation with ACUMAN, in addition to the interest that we gain in the fact that its technology functions autonomously. Given the fact that 18% of businesses rely on a form of psychometric testing, I see the potential of this project in that it allows for a more effecient, flexible, and convenient method of psychometric testing as compared with traditional approaches.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store