19 More Game of Thrones Data Visualizations
Including new script data with languages, gender breakdowns of screentime and words spoken, and what other people are doing with this dataset
This is the third in a series of posts using custom datasets I’ve built for Game of Thrones. Check out “The Ultimate Game of Thrones Dataset” if you want to learn more about the datasets and an extensive narrative chart that maps characters’ paths throughout the show, and have a look at “32 Game of Thrones Data Visualizations” for a bunch of visualizations using those datasets.
With the Season 8 premiere queued up for tonight, I wanted to share another batch of visualizations based on the datasets I’ve been crafting for the last long while. The plan has been to make as many visualizations before season 8 as I could, and then for each of the visualizations to update as I add new data from the season.
Read on to see the new visualizations, a description of an exciting new dataset, and some visualizations made by others using the datasets. The data and code for each of my visualizations is available on github.
Game of Thrones Datasets and Visualizations. Contribute to jeffreylancaster/game-of-thrones development by creating an…
A Celestially-Inspired Heatmap
In “The Ultimate Game of Thrones Dataset”, I described imagining a three-dimensional matrix with
characters along the axes, and how I used that to lay out the Narrative Chart. After seeing a great visualization of a Hubble Image of Galaxy Cluster Converted Into Sound, I was inspired to flatten the 3D matrix into 2D. Here, scenes/time is along the x-axis and locations are along the y-axis (ordered geographically north to south then west to east), and the color encodes the number of people in that location at a particular time. You can see how groups of people ebb and flow in and out of certain locations: Castle Black ebbs when the Night’s Watch heads north of the Wall and Qarth ebbs when Daenerys and her entourage leave for Slaver’s Bay. The white blocks overlaying the lanes represent when that scene is on screen.
After having a play with extracting data from streaming shows for a different series, I became curious about what else could be done using closed caption (.srt) data. At the time, some text scripts for Game of Thrones could be found online (at sites like genius.com and Springfield! Springfield!) and you could download .srt files (at sites like subscene.com), but there were gaps in each. Text scripts were missing entire episodes (and on review were often flawed) and .srt files don’t include the name of the person who is saying whatever is being said.
So I made a new dataset. This dataset:
- is in JSON so it’s easier to work with (as opposed to plaintext .srt files)
- includes speaker names
- includes language spoken and a translation for any non-English lines (more on that below)
- includes lines how they would be spoken as prose instead of broken up by screen capacity (how they are in closed captions)
- doesn’t include sounds (e.g.
chains clanking) like in hearing impaired subtitles (HI)
- will (in the future) include timestamps
So a basic entry looks like this:
"text": "I've never seen Wildlings do a thing like this. I've never seen a thing like this, not ever in my life.",
I rewatched the series again to cross-check the text and speakers, and used this new dataset (not currently shared online) to ask some new questions.
How many words does each character speak?
Is it any surprise that Tyrion says about 9,000 more words than any other character in the show? And that Cersei is quite wordy as well?
We can also take the same data sliced by season, so here’s the wordcount for Season 1; Tyrion’s a talker, that’s for sure, but did you realize Ned says only about 400 fewer words?
Along with just counting the words, the dataset also includes the words spoken in the original language and a translation. Sites like wiki.dothraki.org and other online forums were essential to including accurate words and translations (I do not speak High Valyrian or Dothraki, or any of the other languages of the show).
Here’s an example of what this looks like in the dataset:
"text": "Me zisosh, zhey jalan atthirari anni.",
"translation": "A scratch, moon of my life.",
"name": "Khal Drogo"
What’s the language breakdown of what each character says?
Perhaps not surprisingly, there are only a few multi-lingual characters, including Daenerys (who speaks the Common Tongue, Dothraki, and High Valyrian and even says some words in Astapori Valyrian) and Missandei (who speaks the Common Tongue, Dothraki, and various dialects of Valyrian, including High, Astapori, and Meereenese). But you might also remember Tyrion misspeaking High Valyrian (17 words) or Melisandre introducing herself to Daenerys in High Valyrian (130 words). My favorite insight from this breakdown might be the three words spoken by Wun Wun in Old Tongue (it’s toward the bottom).
If you view this one on its own, it may take a second to load since it’s crunching the data on the fly.
What’s the language breakdown per episode?
In addition to looking at what each character says, we can also look at the language distribution per episode. Here you can see how languages map to particular periods in the show: there’s more Dothraki throughout Season 1, more Valyrian (and variants) around Seasons 3 and 4, and a mix early in Season 6.
I also realized that I had made a few pre-processed files to make it easier to build the narrative chart; I could use files like
characters-gender.json to see how male or how female Game of Thrones is.
What’s the gender balance of each season?
The visualization below charts how male-dominated and how female-dominated scenes are throughout the show. It will come as no surprise that time spent on screen throughout each season is skewed heavily toward more men (or all men) in scenes.
What’s the gender balance of each episode?
Same conclusion here: more time is spent with more men on-screen than women. But it’s interesting to see the trend of increasingly female time on-screen through the end Season 5.
And for good measure, here’s that same data as percentages of each episode to account for longer/shorter episodes.
What’s the gender balance of words spoken?
Unlike time spent onscreen, usually there’s only one person speaking at a time throughout Game of Thrones, so how does the number of words spoken by men compare to the number spoken by women?
Based on these counts, the total number of words per season is on a steady decline; maybe that’s due to more time spent in action scenes (and a shorter Season 7)? Still, the number of words spoken by women per season has increased slightly while the number spoken by men has decreased dramatically.
Let’s look at that same data by episode.
Although no episode is close to parity, some are certainly more askew than others.
And for completeness, let’s look at that same data as percentages. There’s a general trend toward parity in the number of words spoken, but still only maybe 60/40 in favor of male speakers.
Another pre-processed file I realized I hadn’t quite leveraged yet was
characters-groups.json (generalized from
characters-houses.json). Although some groups have more members than others, we can still look at when each group has been shown on-screen, how long each has been shown over the course of the show, and how many words each group has collectively spoken.
When are group members on screen?
How long are group members on screen?
This one may be especially unfair since some groups have more members than others, but it’s still worth a look. And it’s worth noting that it’s important not to over-count when multiple members of the same group could be onscreen at the same time. For instance, if Yara Greyjoy and Theon Greyjoy are onscreen together for 30 seconds, I’ve just counted this as 30 seconds instead of 30 seconds for each Greyjoy.
How many words do group members speak?
Perhaps expectedly, House Lannister is a chatty pride of lions.
And for those wondering: sure, White Walkers don’t talk. But the 3 words spoken by The Night King were “No! No! No!” when the Children of the Forest were stabbing him in the heart with a dragonglass dagger, so there’s that.
But still, some groups have more people than others, so this is a chart we can make, not necessarily one that’s meaningful in any way.
Some other people have found the dataset and used it to make some interesting Game of Thrones visualizations, so I wanted to share those, too.
Type A Media
The team over at Type A Media put together a timelapse animation using the data for how long each character is on screen.
My only gripe with the video is that it looks like some of the data for Jon Snow gets lost from Season 7, Episode 7. By my count, he is actually only about 2 min behind Tyrion as of the end of Season 7.
Viz of Thrones
Adam Groner reached out recently to let me know about a project he’s been working on: Viz of Thrones. He’s impressively layered a bunch of data into a single interface, including a geographic map with characters in locations, sex count, death count, scenes in which characters show up, and more.
Check it out at vizofthrones.com.
The team at Kineviz — CEO Weidong Yang and Director of Business Development Sony Green — also got in touch recently to let me know they were using the dataset to demo their new tool, GraphXR, during a Neo4j online meetup. Check out the screencast of the meetup below to see GraphXR in action.
Lia Petronio (Northeastern University)
Finally, Lia Petronio was the first person (I think) other than me to use my datasets for visualization. She’s made the two shown below: the first is about character travels through various regions, and the second is a social network of characters.
If you end up doing something with this data, please let me know? I’d love to see what you make and include it in a future roundup.
So what’s next?
Well, like many of you, I’m looking forward to Season 8! But I’ve also got a few other things in the works you may want to know about.
I plan to update the
episodes.json file as soon as I can following each episode of the final season. That will in turn update all of the visualizations (because, really that’s the point of data visualization, right?). I’ll also update
characters.json with any new characters that happen to show up, but aside from the Golden Company, I’ll be curious to see how many new characters get introduced in the final season.
I’m also soon going to share the full-text search of the script data I referenced above. I built a search interface using Algolia, but my free trial ran out and I’m not convinced it’s worth paying to host/index that data. I also am in the process of adding timestamps to the script data, so more on that soon.
Lastly, I’m going to be posting an episode data visualization recap after each show with who was on screen when in the episode, total character time per episode, locations on screen in the episode, time spent in each location per episode, the opening sequence of each episode, and maybe some other visualizations. Feel free to follow me if you want to catch those.
If you have ideas for other visualizations or improvements to the data that I’ve shared, please leave a comment. Thanks for reading!
Update (May 2019): “Introducing Game of Thrones Script Search”
Update (Apr. 2019): Now that the final season has begun, I’m posting a weekly data-driven recap of each episode:
- “Winterfell” (Season 8, Episode 1) Data Visualization Recap
- “A Knight of the Seven Kingdoms” (Season 8, Episode 2) Data Visualization Recap
- “The Long Night” (Season 8, Episode 3) Data Visualization Recap
- “The Last of the Starks” (Season 8, Episode 4) Data Visualization Recap
- “The Bells” (Season 8, Episode 5) Data Visualization Recap
- “The Iron Throne” (Season 8, Episode 6) Data Visualization Recap