Building a real-time word cloud from Twitch.tv chat with Node.js and Redis
A long time ago, I built a “Twitch Plays 2048” server that interacted with Twitch’s IRC protocol to let viewers play the game together. It was a hacky mess: Chromium didn’t support remote-control from automated software, the chat protocol was difficult to implement, and the overall experience was sub-par for viewers because of lag.
A few days ago, a friend of mine and I talked about what it would take to build a real-time word cloud from chat messages on Twitch. My goal was to get a prototype working in less than an hour. Here we go 🚀.
Want to jump straight into the code? You can check out the full GitHub repo here.
Architecture
The service consists of a Node.js server which uses tmi.js to connect to the Twitch.tv IRC protocol using a bot account to listen to chat messages. The service also connects to Redis to perform live ranking of words and stores them for retrieval later. Lastly, the service uses Express to host a static HTML file with d3-cloud to generate word clouds.
Implementation
Server
Let’s begin by creating the server component. To recap, The server has two main functions: a) provide endpoints to serve the client HTML / REST API and b) connect to Twitch IRC and record keywords in Redis as they come in.
In this section I set up a basic Express server which reads in a template index.html
(more on that later), and serves it on /
(root) when the server is running. It also replaces the CHANNEL_NAME
in the template to load a particular channel in the browser.
Next, let’s create the Twitch chat event handlers and our connection to Redis.
First, I’m using bluebird
to promisify the Redis package (it does not support Promises at the time of this writing). This allows us to use the Redis client with async
/ await
. Next, I’m setting up options for the tmi.js
package which conveniently wraps communication logic for the Twitch IRC protocol so we don’t have to worry about it. I’ve created a bot account and requested an OAuth token for it which we also pass into the configuration.
The Redis client is configured by simply passing a connection string to the createClient
function.
For tmi.js
to work properly, we have to set up event handlers, most interestingly the message
event. This event is triggered for every message coming in on a particular streamer’s channel. In our onMessageHandler
we take the message, split it into individual words, and call Redis’ ZINCRBY command. This adds the word to the channel’s sorted set (ZSET) and increments the occurrence by one. Over time, this builds a sorted “high-score” of words for a particular channel.
Now that we have the data, let’s add an API route to our Express server to respond with the 50 most frequent words from the chat.
This endpoint makes heavy use of Redis’ ZREVRANGE command which returns the top n
scores from our sorted set. The WITHSCORES
argument is used to include the scores in the response which we later use to calculate the relative size of a word when we display it. The route contains a :channel
parameter which we use to select the right sorted set from Redis.
This is it for the server implementation; let’s focus on the client next. To keep things simple, I embedded a <script>
tag into the HTML. Normally, we would serve is as a separate file 😅. You can see the full HTML in the sample repo.
Client
Our client will do one thing: generate a word cloud from the keywords we receive through the API. We’ll be using d3 and d3-cloud which offer a rich API for visualizations. Let’s get to it.
The above is a function wrapping a request to our back-end REST API for the most frequent words for a particular channel with fetch. When we serve the embedded script with Express, we replace $CHANNEL_NAME
with a pre-configured channel. This function returns an array of objects in the format of [{ text: 'word', size: 10 }, ... ]
which we’ll use to bind to d3-clouds rendering function.
Here, we’re setting up a 500x500
pixel canvas and bind our word cloud to it. The function mostly contains d3 specific rendering logic to arrange the words and calculate a word’s relative size. Of particular interest is the .fontSize()
calculation:
We calculate the highestScore
(largest, most frequent word), by finding the largest size
in our data from the API. Then, we linearly interpolate each word’s relative size by dividing it by the highestScore
and multiplying it with a maximum size (90
in the example). This assures that the font size of each word falls between 1 and 90 and that a word’s size matches its frequency.
For testing purposes, I ran the server against the BobRoss channel on Twitch for about 30 minutes while he was drawing a picture of a cabin in the woods.
Here is the result:
Conclusion
In this article I described a way to use Redis’ sorted set (ZSET) data type to record Twitch.tv chat messages in real time and generate word clouds from it.
The code is very basic and by no means perfect. While coding, I came up with a ton of features that could make it better, but tried to time-box my efforts to one hour.
Overall, this was a fun “can I do it in one hour project” that came out decent enough.
Some ideas I had for a version 2:
- Let the client initiate the channel to monitor (currently embedded by the server)
- Reset functionality to start from scratch.
- Interactively re-bind the word cloud data (currently doing a reload)
Thank you for taking the time to read through my article. If you enjoyed it, please hit the Clap button a few times 👏! If this article was helpful to you, feel free to share it!
For more from me, be sure to follow me on Twitter, here on Medium, or check out my website!