Visualizing the XKCD comics network using Google Vision, spaCy and d3

Billy Mosse
Nov 30, 2020 · 3 min read
Image for post
Image for post
The XKCD comics network. Check it out here!

I love XKCD. According to their website, the webcomic is about romance, sarcasm, math, and language, but after so many years, Randall Munroe explored many other topics as well. Some of them more than once.

I wanted to know the structure of this fantastic stick-figure world he created, so in my spare time I scraped all his webcomics from the interblag (or blagosphere), then extracted all the relevant words from each, and finally plotted the result below. In the following graph, each node represent a comic, and 2 comics share an edge if they contain words in common. The graph is interactive, in the sense that you can drag it along the screen (which is always a fun thing to do), but also when you click on an edge you can see the words in common between the respective 2 nodes (click again to vanish them), and when you click on a node the corresponding comic will appear below the graph.

Link to graph — it uses JavaScript so medium doesn’t allow me to show it here. I’m hosting it in my blog instead.

Moreover, I’m only showing about 20% of the full graph, for readability. It’s a random 20%, so if you refresh the page you’ll see another subset!

If you scroll down a bit, I will explain how I did all this and share code snippets.

Scraping

I first used bs4 and some regular expressions to download all the comics from here: https://www.explainxkcd.com/wiki/index.php/List_of_all_comics_(full). I could have used xkcd.com, and in fact in retrospective it would have been easier, but working with a table seemed easier in the beginning, and then I just stuck with that.

Extracting texts with Google Vision

Then I used Google Vision to get the image from each text. Google Vision is free for the first 1000 images (at least as of 2020). The following script took about 1 hour in my notebook — it would have taken shorter by parallelising:

Extracting important words with spaCy

I fell in love with spaCy when I compared it against NLTK, so I used the former to remove stopwords from each text. I then calculated the edges between each comic — now starting to think about it as a graph. Two comics shared an edge if they had words in common. But no stopwords! It’s not interesting to know that both comic 345 and 876 use the words “and”, “the” and “I”.

Plotting with d3

For plotting, I considered Python’s NetworkX. But I wanted something interactive, and NetworkX made it hard for me to work with callbacks. Someone in my network then recommended d3, which looked promising! So I decided to learn a bit of d3. Note: I really like Gephi, but unfortunately it doesn’t have an easy way of making pictures appear of screen!

This is the code that generates the visualization you see above. It’s self contained — meaning that if you download it you will be able to see the graph locally in your browser as well.

Once more, for the ones who just scrolled till the bottom to see the graph:

Link to graph — it uses JavaScript so medium doesn’t allow me to show it here. I’m hosting it in my blog instead.

Originally published at https://www.fluentdata.tech on November 30, 2020.

Nerd For Tech

From Confusion to Clarification

Billy Mosse

Written by

Mathematician. Data Scientist at fromAtoB. Machine Learning freelancer. Expat. I like bots.

Nerd For Tech

We are tech nerds because we believe in reinventing the world with the power of Technology. Our articles talk about some of the most disruptive ideas, technology, and innovation.

Billy Mosse

Written by

Mathematician. Data Scientist at fromAtoB. Machine Learning freelancer. Expat. I like bots.

Nerd For Tech

We are tech nerds because we believe in reinventing the world with the power of Technology. Our articles talk about some of the most disruptive ideas, technology, and innovation.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store