The Words of Witches & Wizards

Using Data Science techniques to analyse scripts and language from the Harry Potter film franchise

Chris Brownlie
Oct 5, 2019 · 8 min read

As one of the most successful movie franchises of all time, Harry Potter is engrained into the psyche of several generations of adults. Having grossed over $9bn worldwide, it’s safe to say that if you don’t know what Harry Potter is you must have been living under a rock for the last 25 years. Nevertheless, if for some reason you haven’t got round to watching/reading yet, this article contains spoilers…

Those of you who have seen my work will have an idea of how this is going to go down - I’m simply going to apply what I’ve done before to a new dataset. I’ve published a total of 5 analytical pieces to date which look at scripts from the Game of Thrones TV series and pick them apart using Data Science techniques — 3 of which look at the words used and how they reflect characters and storylines (see here for: the original article, part 2, and Season 8. Or view my profile to see all analytical articles). I intend to do the same thing here, looking at how patterns in the use of language can bring new insight to the world of Harry Potter.

Tech Note/Caveat

As with my Game of Thrones analysis, the data has been sourced and compiled by myself, primarily using the R programming language. This time I have used a variety of sources for the scripts themselves (see footnote) and again spent a long time transforming them into a dataset that could be analysed. Please note that there is one important caveat to this analysis: I was unable to find a useable script for the 5th film, Order of the Phoenix. As such all analysis here excludes this film. It is also worth noting that one or two of the scripts are ‘final drafts’ or equivalent so may have the odd line or two which is slightly different to the actual film. Despite these caveats the data in the 7 remaining scripts (6300 lines - over 66k words - spoken by 180 different characters) can be analysed to give some very interesting results.

The star of the show

To begin with I decided to stick to the simple but effective technique of looking at the most common word spoken by each character. Below you can see the results for 24 of the most prominent characters. I’m sure you’ll be able to see the pattern…

The most common word used by each character in 7 of the 8 Harry Potter films excluding film 5 (Order of the Phoenix). All stopwords (he, it, the, of etc.) have been removed. *proportion of non-stopwords that the most common word accounts for
  • Perhaps unsurprisingly, 13 of the 19 main characters mention the eponymous hero more than they talk about anything else.

As this Potter-centric pattern amongst characters isn’t a result I was expecting, I thought it might be interesting to see what these characters’ most common word would be if it wasn’t ‘Harry’ or ‘Potter’. This can be seen below for the 13 characters mentioned above:

The most common word used by each character in 7 of the 8 Harry Potter films excluding film 5 (Order of the Phoenix). All stopwords (he, it, the, of etc.) have been removed. *proportion of non-stopwords, excluding ‘harry’ and ‘potter’, that the most common word accounts for.
  • Some of these are unsurprising (Luna) and others aren’t particularly insightful, even if I do personally find them amusing (Lockhart & Hagrid)

Hagrid’s Slow Death

Another way to look at the data is to see how it varies over the course of the films. This can give an indication of how prevalent each character was in the respective films. This is shown below for the 5 characters who spoke the most over all 7 films:

Word count over the 7 films for the 5 characters who had the highest total word count (each spoke over 3000 words over the franchise).
  • Harry is the most frequent talker in all films except Goblet of Fire (which is more action-based and in fact has the lowest overall word count of any of the films) and the Deathly Hallows Part 1 (where Ron and Hermione feature more prominently, as they are on the run).

‘Accio spells’

The next thing I was interested to look at was the use of spells throughout the film. Magic is such an important part of the world of Harry Potter so no analysis of the films would be complete without considering it.

Going by the data I’ve sourced, a total of 97 spells (which were clearly spoken aloud) were cast over the 7 films in question, below you can see the 10 most popular:

The 10 most common spells spoken aloud according to scripts for Harry Potter films 1–8, excluding film 5. ‘Most frequent caster’ indicates the character who spoke the name of the spell the most. In cases where there is a tie for ‘Most frequent caster’, it is broken alphabetically.
  • Lumos is an incredibly useful spell (it casts light), added to the fact that Harry uses it 4 times in a single scene at the beginning of the Prisoner of Azkaban, it is unsurprising that it comes in first place.

Alohomora? That’s so last year

Another way to break down the data is to look at overall casting of spells over the films and how the popularity of various spells fluctuates. This can be seen below, with a label denoting the most commonly used spell in each film.

The total number of spells cast in each film, with a label denoting the most used spell in each.
  • The number of spells cast tends to increase as the series progresses, with two major exceptions. This makes sense as the latter films contain more fighting/duel scenes.

Summary

Thank you for reading, if you enjoyed please do ‘clap’ below and give Data Slice a follow for similar articles looking at all sorts of topics! Leave a comment below if you have a particular subject you’d like me to analyse and present back in a post. Finally, thank you to my good friend Lauren for giving me the idea to apply my script analysis techniques to the Harry Potter films.


Data Slice

This is the first post to be published in my new publication ‘Data Slice’. DS will be a home for short pieces of analysis which are focussed on empirical evidence and present the findings in a short, engaging article. The target of the articles will vary across all aspects of pop culture, topical events and beyond. If you like this style of presenting ideas please consider giving the publication a follow and sharing. Also make sure to comment with any topics you would like me to analyse. I will also be looking to accept submissions of similar pieces of work, email editor@data-slice.com for any information or queries.


Data Slice

Interesting topical analysis presented in a fun and…

Chris Brownlie

Written by

Data Scientist working in the UK public sector. https://www.linkedin.com/in/chris-brownlie-bb5812b7/

Data Slice

Interesting topical analysis presented in a fun and accessible way.

More From Medium

More from Data Slice

More from Data Slice

More from Data Slice

A Game of Words

More from Data Slice

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade