The Ultimate Game of Thrones Dataset

and an Interactive Game of Thrones Narrative Chart

Notes: (1) dataset contains spoilers, (2) this is distinctly about the television series, and (3) forgive any tangents below — I’ve tried to collect my thinking below so others can benefit from what I’ve done.

Full interactive narrative chart:

xkcd and Sankey Diagrams

Randall Monroe, xkcd

Like many fans, I also recognized the beauty of the narrative threads in which characters’ paths criss cross each other throughout the Lands of Ice and Fire. As a lover of maps — especially subway maps — I wondered whether anyone had visualized this criss-cross, and so I stumbled into the world of narrative charts most-recently popularized by Randall Monroe’s xkcd charts of movies like Star Wars and The Lord of the Rings (above). These charts are derivatives of Sankey diagrams which are typically used to illustrate the flow of materials/money/other data from one state/location/category to another. Although it was created 30 years before Sankey, many fans of data visualization will be familiar with the diagram of Napoleon’s March by Charles Joseph Minard, perhaps the most well-known example of a Sankey-like diagram showing the movement of people through a defined geography.

The Web did not disappoint me in my search for an xkcd-style map for Game of Thrones. Seasons 1–3 had already been charted by a graphic designer (full map). However, it seemed no one had yet updated the chart through the end of season 6 (where I was picking things up).

Still others sought to automate the layout/creation of these sorts of narrative charts, and those projects have been shared here, here, here, and here.

A Path Forward

  1. I could use a force-directed approach, which other folks have done and are doing (see above). It’s certainly a more generalizable approach, especially when there’s no set geography or interest in encoding meaning in a vertical position. OR:
  2. I could use the specific geography of Westeros and Essos to create a location-specific chart where vertical position actually does have meaning. This is the same approach used in the Season 1–3 chart, where the vertical position follows a line drawn roughly from The Lands of Always Winter in the north to Dorne in the south of Westeros, then east from Pentos to Qarth in Essos.

I had another look to try to find a dataset from which I could build this visualization, and failing, I realized the project would really be two-fold:

  1. Create datasets for Game of Thrones, and try to make them as complete as possible, then
  2. Using that data, create a data-driven narrative chart so it will update with new data when seasons 7 and 8 come out.

The Data

But what I really wanted was scene-by-scene information: which characters are together in each scene (co-occurrence/co-appearance). So I rewatched all six seasons and typed out some JSON by hand (because why not?). For each character in a scene, I also included their name, if they were alive (or really dead), had a title (limited to Hand, Khal, Khaleesi, andKing), and if they were born (or really not-yet born).

Ice and Fire World Map

A Wiki of Ice and Fire and the Game of Thrones Wiki were fantastic resources to figure out who characters are and where scenes were taking place when it wasn’t clear from the context clues in the show. I also used a variety of maps (Map of Essos, Ice and Fire World Map, Regions of Westeros, Season 1 Locations Map, Season 2 Locations Map) to locate each scene with a location and a subLocation.

The resulting episodes.json and characters.json datasets are available on github along with a few others, including: locations.json (including a north-to-south arrangement of location and sublocation), characters-houses.json (grouped for styling in the visualization), and characters-include.json (in case I wanted to only focus on main characters).

Aside: Some Simple Counting

Top 10 Characters (by screen time in Seasons 1–6)

  1. Tyrion Lannister: 27,107 sec = 7 hr 31 min 47 sec
  2. Jon Snow: 24,781 sec = 6 hr 53 min 1 sec
  3. Cersei Lannister: 20,545 sec = 5 hr 42 min 25 sec
  4. Daenerys Targaryen: 19,427 sec = 5 hr 23 min 47 sec
  5. Sansa Stark: 18,329 sec = 5 hr 5 min 29 sec
  6. Arya Stark: 17,214 sec = 4 hr 46 min 54 sec
  7. Jaime Lannister: 15,657 sec = 4 hr 20 min 57 sec
  8. Jorah Mormont: 13,364 sec = 3 hr 42 min 44 sec
  9. Theon Greyjoy: 12,234 sec = 3 hr 23 min 54 sec
  10. Samwell Tarly: 11,500 sec = 3 hr 11 min 40 sec

Bottom 10-ish Characters (by screen time in Seasons 1–6)

  1. Ironborn in Skiff: 11 sec
  2. Guymon: 12 sec
  3. Olly’s Mother: 14 sec
  4. Tyrell Guard: 14 sec
  5. Vayon Poole: 15 sec
  6. Rickard Stark: 15 sec
  7. Stark Messenger: 18 sec
  8. Sorcerer: 20 sec
  9. Fruit Vendor: 20 sec
  10. Simpson: 21 sec
  11. Gordy: 21 sec

The Visualization

I imagined a 3-dimensional array where the axes are scenes, uniqueCharacters, and locations. In that array, the code in process.js does the following:

  1. Fill all characters with 0 in the first scene in all locations.
  2. For each scene, if a character is present, enter a 1 in the location (for that character in that scene), then fill all other locations (for that character in that scene) with 0’s because a character cannot be in two places at once.
  3. If a character is dead in a scene ("alive":false), then enter 0 for that character in the following scene (for that character in that same location).
  4. Fill forward: if a character is not in a location (is a 0), they won’t be in that location until they arrive there. Beginning in the first scene, if a character in a location is 0, look at the next scene — if that value is empty, make it 0.
  5. Fill backward: if a character is not in a location (is a 0), check to see if they were in that location previously. Beginning in the last scene, if a character in a location is 0, look at the previous scene — if that value is empty, make it 0.
  6. Any remaining empty values must be 1's.
  7. Count the maximum number of characters in a scene at each particular location. This number will be used to determine the y-position for each character and the y-range of each geographical region.
  8. Calculate the middle of each geographical region and the y-axis offset for each character in the region (characters in a location during a scene are sorted alphabetically ) then assign each character a y-value for that scene based on all of the characters in that location at that time.

This strategy generates the y-coordinate for a character in a scene, and the x-coordinate is simply related to the overall timestamp of the scene relative to the beginning of the first episode of season 1.

I then rendered the data using d3.js (h/t Mike Bostock, et al) in the visualization below. In true Minard-ian (Minard-esque?) fashion, it includes information about: which characters are in which scene, their location, their family/affilitation, their title (Hand, Khaleesi, etc.), and their death along with information about the season, episode, and episode title. The full interactive visualization is here:

Game of Thrones Narrative Chart (Seasons 1–6)

What’s Next & Other Stuff

The visualization itself will also continue to develop. I plan to add information for each death (from the killedBy field in characters.json), some fancier animations, and maybe even additional visuals pulled from IMDB (via actorLink in characters.json).

I’m perhaps most excited to see how other people take, reuse, and add to the data. The data could be used to recreate (and augment) HBO’s Tower of Joy infographic or any one of the many other Game of Thrones infographics, or it could be used to make a supercut of your favorite character’s story arc; the Sansa Stark movie is currently just over 5 hours through the end of season 6! Maybe you could even begin to do predictive analytics on who will die next (and when)? ⚔️

What do you think you could do with this data?

The full project code is on github, and I’m happy to answer any questions about it here or there.

p.s. I like these, too: Westeros Transit Map and Known World Transit Map.

Update (Feb. 2019): I’ve now written a follow-up to this: “32 Game of Thrones Data Visualizations”. Enjoy!

Update (Apr. 2019): And here’s another follow-up: “19 More Game of Thrones Data Visualizations”.

Update (May 2019): “Introducing Game of Thrones Script Search

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store