Making a Whimsical Data Visualization
A few months ago, I saw Randy Rainbow’s live comedy show, in which he incorporates some of his song parody videos. It was wonderful to laugh so much, and with a whole theater full of people. I left feeling a little lighter, and curious about how such a great show came to be. I was curious about how Randy Rainbow developed his video style.
He couldn’t have just appeared out of the blue. How did his style and technique evolve over time? How many of his videos could I find?
A quick search brought up several of his most recent, and most popular. To find more, I explored the ‘up next’ list. But to find older videos, I had to dig around.
It took a while, but eventually, I’d found around 100 videos by Randy Rainbow on youtube. To keep track of all the videos, I created a simple spreadsheet, and when I found a video I’d add the video’s link to it.
Starting out, my data set was just a list of links. Once I had a list of videos and their links, I collected some additional information: video length, views, transcripts, publish date, and so on.
For my initial visualizations, I started out looking at how popular each video was, and I used the video ‘views’ as a metric for this. In 2016, the frequency of his video releases increased. I’m quite fond of bee-swarm plots, and it made sense to use one with such clumped data.
This was when I noticed that most of his recent and popular videos were song parodies, and that these really started to take off in late 2015 and early 2016. Maybe I could draw that out in the visualization.
Then I found his press-kit, with these fun images. The colors were perfect! It gave me the idea of color coding which videos were song parodies and which had different formats.
Then I noticed that a video I was watching had a transcript. Oh! That gave me ideas! Did they all have transcripts?
Well, a lot of them did. But not all. I’d have to figure something out later, for the missing transcripts. There were enough with them, that it was worth the effort of exploring them.
It had been a while since I’d done any NLP coding, so I spent some time exploring basics in python and NLTK. Pretty quickly, I had a little script that looped over the transcript files, and generated a simple JSON formatted report with the most common words in each video.
So now I could start playing with using these words in the visualization. My first attempts were very basic. But they helped me decide how to deal with the missing transcripts…by showing “no transcript” instead of the video’s most frequent words.
While technically interesting, this was visually boring. So I started playing with some other ideas. I ended up creating little word-burst animations. If the video was a song parody, there would also be little musical notes, in addition to words.
The annotation, on the side, shows a little information on each video. It’s triggered by a mouse event on each bubble, but I’ve also coded it to have a default annotation when the page loads. I think it looks best when the default annotation is pointing to a bubble on the left side of the chart.
Recently, I added thumbnail images (from the videos) to the annotation. I’m still working on that, but the current online version has it.
The current version of this whimsical data visualization is online! Mouse over the bubbles to trigger the word-bursts.
I’m working on a major addition to the visualization, but it’s not quite ready yet. In the mean time, here are some out-takes and pretty bugs I’ve created so far.
Out-takes and Bugs
I’ve also written a little about my data collection, from a more technical perspective, if you’re interested. It’s more for my own reference, when I need to update the data set. Be warned: it’s a bit clunky and a work-in-progress.