Member-only story

Star Wars Data Science

Network Analysis, Topic Modeling, and a Wordcloud

Dennis Bakhuis
Towards Data Science
14 min readMay 3, 2021

--

Star Wars 🌌 is the most epic fantasy space adventure of all times (strongly biased). Why only save one world when you can rescue whole galaxies! Each year, millions of fans celebrate the Star Wars day on May the fourth. Last year I had some fun and created a blog post that could decipher a secret message from Mustafar using a neural network (build from scratch).

Figure 1: The wordcloud and network graph created in this article.

To start a tradition, this year, I combined Star Wars with Data Science yet again. An infamous source of Star Wars information is collected on the Wookieepedia, a Fandom site with thousands of pages. An amazing source to investigate using Data Science tools such as topic modeling and network analysis.

As it is a rather long article, I have divided it in a number of topics. In this way you can easily skim down to the topics you are interested in the most.

An overview of topics:

  1. Scraping and building dataset
  2. Wookieepedia Data exploration
  3. We need a wordcloud!
  4. Topic modeling
  5. Wookieepedia network analysis

The github repository with all notebooks and the dataset can be found here.

👉 Interactive network graph

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Dennis Bakhuis
Dennis Bakhuis

Written by Dennis Bakhuis

Data Scientist with a passion for natural language processing and deep learning. Python and open source enthusiast. Background in fluid dynamics.

Responses (3)