The Elder Scrolls V: Skyrim Special Edition — Analysis of Dialogues

Read this on my blog

Background

After my last visualisation of Harry Potter data, I decided to use some other data to create word clouds. I am a huge Skyrim fan and always wanted to learn how to use xEdit Scripts. As a result, here I am with another Word Cloud.

Visualisations

Getting the data

I don’t like opening my Windows installation (I have a dual boot setup, and use Manjaro mainly), and looked around the internet for some sort of data dump of Skyrim Dialogues. Unfortunately, I couldn’t find any and then decided to extract the data myself. I had recently formatted my Windows Partition so had to reinstall the game. It also provided the benefit that no mods would pollute the data. (I had over 150 mods before the format). I downloaded the latest xEdit and used the Export dialogues.pas script that comes with it to export all the dialogues. (It took me 22:05 minutes).

Processing the data

In the CSV, there were two columns of data I was interested in RESPONSE TEXT & TOPIC TEXT. Response Text was the larger one, with over 40k unique dialogues. Topic Text had only around 5.5K unique dialogues and also needed some additional processing. Topic Text contained some game constants such as RoomCost , HorseCost, and other prices, which had to be filtered out. I did all that in csv_to_json.py.

Counting the Words

Like the previous visualisation, I used nltk’s stopwords corpus, along with a modified version of 20k most common words by Google. Interestingly, the modifications I did for Harry Potter were valid for Skyrim as well, because there is no dialogue with names like Harry, Ron, Arthur, etc and they share words like vampires, magic, etc.

Making the WordCloud

I used pretty much the same process as the last visualisation. I changed the maximum font size to depict the variation properly and used a custom font this time.

Making the Graph

I initially planned on making a set of graphs from the data, but wasn’t able to due to two reasons:

  1. Some of the data doesn’t; produce interesting visualisations. Nords have the highest dialogue count, and after the difference between the first few races and the remaining is so huge that a lot of the races aren’t visible.

Future Plans

I will look into making custom scripts (if someone already has them, do share it with me) to extract other interesting data from Skyrim and see what I can do with them.

References:

Student | Developer | Photographer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store