The algorithm behind the Harry Potter Sorting Hat

A baby programmer
4 min readDec 16, 2023

--

The shot above is taken from one of the most famous scenes in the Harry Potter movies. In this scene, The Sorting hat will place Harry Potter in one of the four houses, determining his future friends and the kind of person he will become.

Interestingly, I personally associate this archetypal figure with one of the eyes of Horus. In service of the ruler (Albus Dumbledore), he sees above (for the Egyptians was the hawk, in this mythology is the hat) and checks on the stability of the Kingdom. In this case, it also personifies the element of selection.

I am sure the hat is a robot

Unfortunately, I do not believe in magic. In fact, I am just convinced of the idea that, in reality, the hat is no more than a simple classification algorithm (with decent use of RNNs, Recurrent Neural Networks, for speech generation).

Big 5 selection

If you are comfortable with the world of Harry Potter, you will know that the Sorting Hat selects people by their personality traits (an interesting switch of the concept of free will: the hat will make the choices for you that you are likely to make for yourself). I will assume he is going to use the Big 5 personality traits: Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism.

Fortunately, we already know the corresponding personality models of the four Harry Potter houses thanks to this research (University of California, 2019):

Unfortunately, we do not have a large DataFrame to train an AI. However, we still have valuable data. We know that from the boxplots of our features we can estimate the mean and standard deviations of every single data graphed above. That is all we need to create a probability distribution for a classification algorithm.

#Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism
[41.03, 6.32], [35.58, 7.13], [31.02, 12.20], [40.51, 7.35], [30.66, 11.17], #Griffindor
[39.41, 9.33], [36.76, 6.17], [27.79, 13.16], [41.61, 7.27], [30.00, 12.13], #Hufflepuff
[41.98, 8.23], [36.76, 6.17], [28.52, 11.17], [40.51, 7.35], [29.55, 11.17], #Ravenclow
[40.51, 7.35], [39.70, 6.10], [27.42, 11.17], [36.17, 10.22], [28.97, 12.20] #Slytherin

This is the list of mean and standard deviations for each of the features listed in a single array.

Creating the Normal Distributions

Unfortunately, the AI I will be using in scikit-learn only works on a DataFrame. We cannot directly train it using the inputs (mean and standard deviation) of the normal distribution: we will need to create the DataFrame ourselves.

In order to have usable data for my AI, I will need to create a normal distribution of each feature (Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism) for each of the four classes (‘Griffindor’, ‘Hufflepuff’, ‘Ravenclow’, ‘Slytherin’).

Given the prior mean and standard deviation I simulated random normal distribution of 1,000,000 samples. We know have a DataFrame to train our data.

Training the AI

For my purpose, I will use a Gaussian Naive Bayes classifier. This is a multi-class classification problem. After training, my AI will be able to estimate Hogwarts’s house given the personality traits as inputs.

Testing the AI

These are my personality traits. I will use them to see to which Harry Potter house I belong to.

If you want to discover your personality, TAKE THE TEST.

Scaling the Data

My scores for The Big 5 in the scale of [0, 120] are:

[85, 111, 78, 47, 74]

Scaled in the form [0, 50] they become:

[35.41, 46.25, 32.5, 19.58, 30.83]

I can make a prediction based on these data:

['Slytherin']

I will tell you: I expected it.

A big thanks to you for read this type!

You can follow me at | Telegram | GitHub | Medium |

--

--

A baby programmer

Teaching Programming codes and Computer science with A baby programmer