The algorithm behind the Harry Potter Sorting Hat
The shot above is taken from one of the most famous scenes in the Harry Potter movies. In this scene, The Sorting hat will place Harry Potter in one of the four houses, determining his future friends and the kind of person he will become.
Interestingly, I personally associate this archetypal figure with one of the eyes of Horus. In service of the ruler (Albus Dumbledore), he sees above (for the Egyptians was the hawk, in this mythology is the hat) and checks on the stability of the Kingdom. In this case, it also personifies the element of selection.
I am sure the hat is a robot
Unfortunately, I do not believe in magic. In fact, I am just convinced of the idea that, in reality, the hat is no more than a simple classification algorithm (with decent use of RNNs, Recurrent Neural Networks, for speech generation).
Big 5 selection
If you are comfortable with the world of Harry Potter, you will know that the Sorting Hat selects people by their personality traits (an interesting switch of the concept of free will: the hat will make the choices for you that you are likely to make for yourself). I will assume he is going to use the Big 5 personality traits: Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism.
Fortunately, we already know the corresponding personality models of the four Harry Potter houses thanks to this research (University of California, 2019):
Unfortunately, we do not have a large DataFrame to train an AI. However, we still have valuable data. We know that from the boxplots of our features we can estimate the mean and standard deviations of every single data graphed above. That is all we need to create a probability distribution for a classification algorithm.
#Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism
[41.03, 6.32], [35.58, 7.13], [31.02, 12.20], [40.51, 7.35], [30.66, 11.17], #Griffindor
[39.41, 9.33], [36.76, 6.17], [27.79, 13.16], [41.61, 7.27], [30.00, 12.13], #Hufflepuff
[41.98, 8.23], [36.76, 6.17], [28.52, 11.17], [40.51, 7.35], [29.55, 11.17], #Ravenclow
[40.51, 7.35], [39.70, 6.10], [27.42, 11.17], [36.17, 10.22], [28.97, 12.20] #Slytherin
This is the list of mean and standard deviations for each of the features listed in a single array.
Creating the Normal Distributions
Unfortunately, the AI I will be using in scikit-learn only works on a DataFrame. We cannot directly train it using the inputs (mean and standard deviation) of the normal distribution: we will need to create the DataFrame ourselves.
In order to have usable data for my AI, I will need to create a normal distribution of each feature (Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism) for each of the four classes (‘Griffindor’, ‘Hufflepuff’, ‘Ravenclow’, ‘Slytherin’).
Given the prior mean and standard deviation I simulated random normal distribution of 1,000,000 samples. We know have a DataFrame to train our data.
Training the AI
For my purpose, I will use a Gaussian Naive Bayes classifier. This is a multi-class classification problem. After training, my AI will be able to estimate Hogwarts’s house given the personality traits as inputs.
Testing the AI
These are my personality traits. I will use them to see to which Harry Potter house I belong to.
If you want to discover your personality, TAKE THE TEST.
Scaling the Data
My scores for The Big 5 in the scale of [0, 120] are:
[85, 111, 78, 47, 74]
Scaled in the form [0, 50] they become:
[35.41, 46.25, 32.5, 19.58, 30.83]
I can make a prediction based on these data:
['Slytherin']
I will tell you: I expected it.