Member-only story
Star Wars Data Science
Network Analysis, Topic Modeling, and a Wordcloud
Star Wars 🌌 is the most epic fantasy space adventure of all times (strongly biased). Why only save one world when you can rescue whole galaxies! Each year, millions of fans celebrate the Star Wars day on May the fourth. Last year I had some fun and created a blog post that could decipher a secret message from Mustafar using a neural network (build from scratch).
To start a tradition, this year, I combined Star Wars with Data Science yet again. An infamous source of Star Wars information is collected on the Wookieepedia, a Fandom site with thousands of pages. An amazing source to investigate using Data Science tools such as topic modeling and network analysis.
As it is a rather long article, I have divided it in a number of topics. In this way you can easily skim down to the topics you are interested in the most.
An overview of topics:
- Scraping and building dataset
- Wookieepedia Data exploration
- We need a wordcloud!
- Topic modeling
- Wookieepedia network analysis
The github repository with all notebooks and the dataset can be found here.