19 More Game of Thrones Data Visualizations

Including new script data with languages, gender breakdowns of screentime and words spoken, and what other people are doing with this dataset

Jeffrey Lancaster
10 min readApr 14, 2019

This is the third in a series of posts using custom datasets I’ve built for Game of Thrones. Check out “The Ultimate Game of Thrones Dataset” if you want to learn more about the datasets and an extensive narrative chart that maps characters’ paths throughout the show, and have a look at “32 Game of Thrones Data Visualizations” for a bunch of visualizations using those datasets.

With the Season 8 premiere queued up for tonight, I wanted to share another batch of visualizations based on the datasets I’ve been crafting for the last long while. The plan has been to make as many visualizations before season 8 as I could, and then for each of the visualizations to update as I add new data from the season.

Read on to see the new visualizations, a description of an exciting new dataset, and some visualizations made by others using the datasets. The data and code for each of my visualizations is available on github.

A Celestially-Inspired Heatmap

In “The Ultimate Game of Thrones Dataset”, I described imagining a three-dimensional matrix with scenes, locations, and characters along the axes, and how I used that to lay out the Narrative Chart. After seeing a great visualization of a Hubble Image of Galaxy Cluster Converted Into Sound, I was inspired to flatten the 3D matrix into 2D. Here, scenes/time is along the x-axis and locations are along the y-axis (ordered geographically north to south then west to east), and the color encodes the number of people in that location at a particular time. You can see how groups of people ebb and flow in and out of certain locations: Castle Black ebbs when the Night’s Watch heads north of the Wall and Qarth ebbs when Daenerys and her entourage leave for Slaver’s Bay. The white blocks overlaying the lanes represent when that scene is on screen.

Wordcount

After having a play with extracting data from streaming shows for a different series, I became curious about what else could be done using closed caption (.srt) data. At the time, some text scripts for Game of Thrones could be found online (at sites like genius.com and Springfield! Springfield!) and you could download .srt files (at sites like subscene.com), but there were gaps in each. Text scripts were missing entire episodes (and on review were often flawed) and .srt files don’t include the name of the person who is saying whatever is being said.

So I made a new dataset. This dataset:

  • is in JSON so it’s easier to work with (as opposed to plaintext .srt files)
  • includes speaker names
  • includes language spoken and a translation for any non-English lines (more on that below)
  • includes lines how they would be spoken as prose instead of broken up by screen capacity (how they are in closed captions)
  • doesn’t include sounds (e.g. chains clanking) like in hearing impaired subtitles (HI)
  • will (in the future) include timestamps

So a basic entry looks like this:

  ...
}, {
"text": "I've never seen Wildlings do a thing like this. I've never seen a thing like this, not ever in my life.",
"name": "Will"
}, {
...

I rewatched the series again to cross-check the text and speakers, and used this new dataset (not currently shared online) to ask some new questions.

How many words does each character speak?

Is it any surprise that Tyrion says about 9,000 more words than any other character in the show? And that Cersei is quite wordy as well?

View it in a new tab. Based on Andrew Reid’s Horizontal Stacked Bar Chart.

We can also take the same data sliced by season, so here’s the wordcount for Season 1; Tyrion’s a talker, that’s for sure, but did you realize Ned says only about 400 fewer words?

Season 1. View it in a new tab. Based on Andrew Reid’s Horizontal Stacked Bar Chart.

Language

Along with just counting the words, the dataset also includes the words spoken in the original language and a translation. Sites like wiki.dothraki.org and other online forums were essential to including accurate words and translations (I do not speak High Valyrian or Dothraki, or any of the other languages of the show).

Here’s an example of what this looks like in the dataset:

}, {
"lang": "Dothraki",
"text": "Me zisosh, zhey jalan atthirari anni.",
"translation": "A scratch, moon of my life.",
"name": "Khal Drogo"
}, {

What’s the language breakdown of what each character says?

Perhaps not surprisingly, there are only a few multi-lingual characters, including Daenerys (who speaks the Common Tongue, Dothraki, and High Valyrian and even says some words in Astapori Valyrian) and Missandei (who speaks the Common Tongue, Dothraki, and various dialects of Valyrian, including High, Astapori, and Meereenese). But you might also remember Tyrion misspeaking High Valyrian (17 words) or Melisandre introducing herself to Daenerys in High Valyrian (130 words). My favorite insight from this breakdown might be the three words spoken by Wun Wun in Old Tongue (it’s toward the bottom).

View it in a new tab. Based on Andrew Reid’s Horizontal Stacked Bar Chart.

If you view this one on its own, it may take a second to load since it’s crunching the data on the fly.

What’s the language breakdown per episode?

In addition to looking at what each character says, we can also look at the language distribution per episode. Here you can see how languages map to particular periods in the show: there’s more Dothraki throughout Season 1, more Valyrian (and variants) around Seasons 3 and 4, and a mix early in Season 6.

View it in a new tab. Based on Andrew Reid’s Horizontal Stacked Bar Chart.

By Gender

I also realized that I had made a few pre-processed files to make it easier to build the narrative chart; I could use files like characters-gender.json to see how male or how female Game of Thrones is.

What’s the gender balance of each season?

The visualization below charts how male-dominated and how female-dominated scenes are throughout the show. It will come as no surprise that time spent on screen throughout each season is skewed heavily toward more men (or all men) in scenes.

View it in a new tab. Based on wpoely86’s Diverging Stacked Bar Chart.

What’s the gender balance of each episode?

Same conclusion here: more time is spent with more men on-screen than women. But it’s interesting to see the trend of increasingly female time on-screen through the end Season 5.

View it in a new tab. Based on wpoely86’s Diverging Stacked Bar Chart.

And for good measure, here’s that same data as percentages of each episode to account for longer/shorter episodes.

View it in a new tab. Based on wpoely86’s Diverging Stacked Bar Chart.

What’s the gender balance of words spoken?

Unlike time spent onscreen, usually there’s only one person speaking at a time throughout Game of Thrones, so how does the number of words spoken by men compare to the number spoken by women?

Based on these counts, the total number of words per season is on a steady decline; maybe that’s due to more time spent in action scenes (and a shorter Season 7)? Still, the number of words spoken by women per season has increased slightly while the number spoken by men has decreased dramatically.

View it in a new tab. Based on wpoely86’s Diverging Stacked Bar Chart.

Let’s look at that same data by episode.

View it in a new tab. Based on wpoely86’s Diverging Stacked Bar Chart.

Although no episode is close to parity, some are certainly more askew than others.

And for completeness, let’s look at that same data as percentages. There’s a general trend toward parity in the number of words spoken, but still only maybe 60/40 in favor of male speakers.

View it in a new tab. Based on wpoely86’s Diverging Stacked Bar Chart.

By House

Another pre-processed file I realized I hadn’t quite leveraged yet was characters-groups.json (generalized from characters-houses.json). Although some groups have more members than others, we can still look at when each group has been shown on-screen, how long each has been shown over the course of the show, and how many words each group has collectively spoken.

When are group members on screen?

How long are group members on screen?

This one may be especially unfair since some groups have more members than others, but it’s still worth a look. And it’s worth noting that it’s important not to over-count when multiple members of the same group could be onscreen at the same time. For instance, if Yara Greyjoy and Theon Greyjoy are onscreen together for 30 seconds, I’ve just counted this as 30 seconds instead of 30 seconds for each Greyjoy.

View it in a new tab. Based on Andrew Reid’s Horizontal Stacked Bar Chart.

How many words do group members speak?

Perhaps expectedly, House Lannister is a chatty pride of lions.

And for those wondering: sure, White Walkers don’t talk. But the 3 words spoken by The Night King were “No! No! No!” when the Children of the Forest were stabbing him in the heart with a dragonglass dagger, so there’s that.

View it in a new tab. Based on Andrew Reid’s Horizontal Stacked Bar Chart.

But still, some groups have more people than others, so this is a chart we can make, not necessarily one that’s meaningful in any way.

Others

Some other people have found the dataset and used it to make some interesting Game of Thrones visualizations, so I wanted to share those, too.

Type A Media

The team over at Type A Media put together a timelapse animation using the data for how long each character is on screen.

View it on YouTube. (Type A Media)

My only gripe with the video is that it looks like some of the data for Jon Snow gets lost from Season 7, Episode 7. By my count, he is actually only about 2 min behind Tyrion as of the end of Season 7.

Viz of Thrones

Adam Groner reached out recently to let me know about a project he’s been working on: Viz of Thrones. He’s impressively layered a bunch of data into a single interface, including a geographic map with characters in locations, sex count, death count, scenes in which characters show up, and more.

Check it out at vizofthrones.com.

View it in a new tab. (Adam Groner)

GraphXR (Kineviz)

The team at Kineviz — CEO Weidong Yang and Director of Business Development Sony Green — also got in touch recently to let me know they were using the dataset to demo their new tool, GraphXR, during a Neo4j online meetup. Check out the screencast of the meetup below to see GraphXR in action.

View the webcast in a new tab. (Weidong Yang & Sony Green, Kineviz)

Lia Petronio (Northeastern University)

Finally, Lia Petronio was the first person (I think) other than me to use my datasets for visualization. She’s made the two shown below: the first is about character travels through various regions, and the second is a social network of characters.

View it in a new tab. (Lia Petronio)
View it in a new tab. (Lia Petronio)

If you end up doing something with this data, please let me know? I’d love to see what you make and include it in a future roundup.

Looking Ahead

So what’s next?

Well, like many of you, I’m looking forward to Season 8! But I’ve also got a few other things in the works you may want to know about.

I plan to update the episodes.json file as soon as I can following each episode of the final season. That will in turn update all of the visualizations (because, really that’s the point of data visualization, right?). I’ll also update characters.json with any new characters that happen to show up, but aside from the Golden Company, I’ll be curious to see how many new characters get introduced in the final season.

I’m also soon going to share the full-text search of the script data I referenced above. I built a search interface using Algolia, but my free trial ran out and I’m not convinced it’s worth paying to host/index that data. I also am in the process of adding timestamps to the script data, so more on that soon.

Lastly, I’m going to be posting an episode data visualization recap after each show with who was on screen when in the episode, total character time per episode, locations on screen in the episode, time spent in each location per episode, the opening sequence of each episode, and maybe some other visualizations. Feel free to follow me if you want to catch those.

If you have ideas for other visualizations or improvements to the data that I’ve shared, please leave a comment. Thanks for reading!

--

--