Network Analysis & Visualisation: Game of Thrones Character Network

Published in

Analytics Vidhya

7 min readMar 30, 2022

This article will help you understand the basics of network centrality measures and how to compute them using networkx library on a Game of Thrones character dataset. Further, you will learn how to visualise networks in pyvis and create and deploy your network dashboard on Streamlit.

What you will be able to create at the end of this:

Introduction to Network Science

We are surrounded by systems that are increasingly complex. For example, our ability to reason and understand the world around us requires the coherent activity of billions of neurons in our brain, our biological existence is a product of seamless interactions between thousands of genes within our cells. These systems are called complex systems. They play an important role in our day to day life and understanding them is one of the major intellectual challenges of the 21st century. [1]

Behind each complex system is a an intricate network that encodes the interactions between the system’s components. Networks often refer to real complex systems such as social networks, metabolic networks, character networks, citation networks etc. Network science is an interdisciplinary field which is highly quantitative in nature and has tremendous societal impact. It has applications in neuroscience, military, epidemic predictions, to name a few. For example detailed maps of mammalian brains could lead to a revolution in brain science, social networks have been used in finding Saddam Hussein, mobile call networks have been examined to identify those responsible for the March 11, 2004 Madrid train bombings; these are just some examples of the wide applicability and utility of this field. [1]

We refer to the objects of a network as nodes and links connect nodes together to form the network. Whereas a graph is a mathematical representation of a network whose objects are called vertices and edges connect vertices to form a graph. These terms are often used interchangeably.

Network Properties & Centrality Measures

Background and Theory

In network analysis, centrality measures are used to identify how central or important the nodes are. Depending on how we define the measure of “importance” or centrality different nodes could be considered as more or less significant. Calculating centrality can help in identifying which people are most influential in a social media network, which papers are widely cited in a citation network, powerful criminals in a crime network and so on.

Some commonly used centrality measures have been described below:

Node Degree/Degree Centrality: Degree of a node refers to the number of edges attached to the node. The standardized score is obtained by dividing the node degree by the n-1 where n is the number of nodes and this gives us the degree centrality.

Closeness Centrality: Closeness centrality measures importance of a node by how close it is to all other nodes in the graph.

Let dij be the length of the shortest path between nodes i and j. The average distance of node i is given by

The closeness centrality is inversely proportional to the average length or is a reciprocal of farness, thus is given by

Betweenness Centrality: Betweenness centrality is a measure of importance of a node in terms of the connection it creates among other nodes. For example, a node can have a small degree centrality but it might play an important role in keeping together clusters of several nodes. Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes.

The betweenness centrality of a node i in a graph G is computed as follows:

For each pair of vertices, compute the shortest path between them.
For each pair of vertices, determine the fraction of shortest paths that pass through the given vertex.
Sum this fraction over all pairs of vertices.

where the denominator represents the total number of shortest paths between h and j and the numerator represents the number of those paths that pass through node i.

Eigenvector Centrality: Eigenvector centrality defines centrality of a node as porportional to its neighbours’ importance. It is based on the idea that connections to high scoring nodes contribute more to the importance of a node as compared to equal connections to low scoring nodes.

For a graph G, the vertices V and edges E, an entry in the adjacency matrix A has a value 1 if a vertex i is linked to a vertex j and a value 0 otherwise.

The relative eigenvector centrality score of a vertex i can be defined as

where M(i) is the set of all neigbours of i and lambda is a constant.

This equation is defined recursively and it requires finding the eigenvector centrality of all neighbour nodes. The original equation can be written in vector notation as:

This equation can be solved using linear algebra to find the value of lambda. The greatest eigenvector gives us the centrality scores (by Perron-Frobenius theorem). Here the condition is that A is a positive matrix, which is true since it is an adjacency matrix.

Let’s start writing some code!

We will be using the Game of Thrones character dataset for our analysis. This is a co-occurence network of the characters in Game of Thrones books. Here two characters are considered to co-occur if their names appear in the vicinity of 15 words from one another in the books.

Let us import the required libraries and view the dataset.

We can now create networks corresponding to each dataset using the networkx library and compute different centrality measures over it.

We can print the centrality values, for example, for eigenvector centrality we see that Ned Stark had highest eigenvector centrality in book 1 while Daenerys has highest eigenvector centrality in book 5.

Displaying eigenvector centralities for Book 1 and Book 5

RIP Ned Stark :(

We can also compute network statistics such as number of nodes, number of edges, diameter etc. for each of the networks.

Network Visualization using Pyvis

Pyvis library is meant for quick generation of visual network graphs and is designed as a wrapper around the popular Javascript visJS library. You can easily build amazing visualizations using pyvis with just a few lines of code!

You can create a simple graph using the following lines of code and save it as an HTML file.

When you open the HTML page, you will be able to view a panel where you can change the node size, colors and several other parameters.

Once you decide on the customization for your graph, you can save these settings by clicking on generate options at the bottom of the page and embedding it in your source file.

Below is a function which generates a graph for a given dataframe from the GOT dataset.

Create a dashboard using Streamlit

Streamlit is an open source Python library which can be used to create web apps in just minutes. Deploying your network graphs on Streamlit will allow users to interact with them directly.

Now let’s combine everything above and create a streamlit dashboard!

Character Information from Wikipedia

This is a pretty cool API you can use to load information directly from Wikipedia.

Customizing the app layout

Streamlit allows you to create containers, setup multi-select sidebars, write latex code, save HTML files as components and display it in the dashboard.

Click here for the complete code!

Setting the theme for your Streamlit Dashboard

Once you deploy your dashboard locally, you can select your theme under Settings-> Edit Active Theme.
Customize the theme and select Copy to clipboard.
Create a config.toml file in your repo and paste this there.

[theme] 
# The preset Streamlit theme that your custom theme inherits from one of "light" or "dark"
base = "dark" 
# Primary accent color for interactive 
elements.primaryColor ="#35b6dc"

Deploying your dashboard via GitHub

Once you commit your datasets, code, config.toml and requirements.txt file, you can use the GitHub repo link to deploy your streamlit dashboard.

Go to Streamlit Share
Login via GitHub
Click on New App, select the repo, the main file path (your .py file) and click on Deploy!

It might take a while to deploy the app. Once deployed the apps will be available in your Streamlit share page and you can be viewed by anyone with the link. Any changes made in your GitHub repo will automatically reflect in your app.

Ta-da! You have successfully created a deployed your network dashboard!

References

[1] A.L. Barabási, Network Science Book (2015), Cambridge University Press

Useful Links

GitHub Repo: https://github.com/nairnayana3/dashboard_network_got
Streamlit Dashboard: https://share.streamlit.io/nairnayana3/dashboard_network_got/main/got_network.py
Dataset: https://www.kaggle.com/code/mmmarchetti/game-of-thrones-network-analysis/notebook
Deploying Pyvis on Streamlit (detailed tutorial): https://towardsdatascience.com/how-to-deploy-interactive-pyvis-network-graphs-on-streamlit-6c401d4c99db
The coolest citation network ever: https://www.connectedpapers.com

Hope you found this article helpful. Feel free to leave your feedback below and make sure to follow my Medium account for more!

Thank you and have a nice day :)