Building a real-time word cloud from Twitch.tv chat with Node.js and Redis

Mario Tacke
DailyJS
Published in
5 min readMay 20, 2019
Photo by Michael Jeffrey on Unsplash

A long time ago, I built a “Twitch Plays 2048” server that interacted with Twitch’s IRC protocol to let viewers play the game together. It was a hacky mess: Chromium didn’t support remote-control from automated software, the chat protocol was difficult to implement, and the overall experience was sub-par for viewers because of lag.

A few days ago, a friend of mine and I talked about what it would take to build a real-time word cloud from chat messages on Twitch. My goal was to get a prototype working in less than an hour. Here we go 🚀.

Want to jump straight into the code? You can check out the full GitHub repo here.

Architecture

The service consists of a Node.js server which uses tmi.js to connect to the Twitch.tv IRC protocol using a bot account to listen to chat messages. The service also connects to Redis to perform live ranking of words and stores them for retrieval later. Lastly, the service uses Express to host a static HTML file with d3-cloud to generate word clouds.

Implementation

Server

Let’s begin by creating the server component. To recap, The server has two main functions: a) provide endpoints to serve the client HTML / REST API and b) connect to Twitch IRC and record keywords in Redis as they come in.

Setting up a basic Express server rendering a template index.html

In this section I set up a basic Express server which reads in a template index.html (more on that later), and serves it on / (root) when the server is running. It also replaces the CHANNEL_NAME in the template to load a particular channel in the browser.

Next, let’s create the Twitch chat event handlers and our connection to Redis.

First, I’m using bluebird to promisify the Redis package (it does not support Promises at the time of this writing). This allows us to use the Redis client with async / await. Next, I’m setting up options for the tmi.js package which conveniently wraps communication logic for the Twitch IRC protocol so we don’t have to worry about it. I’ve created a bot account and requested an OAuth token for it which we also pass into the configuration.

The Redis client is configured by simply passing a connection string to the createClient function.

For tmi.js to work properly, we have to set up event handlers, most interestingly the message event. This event is triggered for every message coming in on a particular streamer’s channel. In our onMessageHandler we take the message, split it into individual words, and call Redis’ ZINCRBY command. This adds the word to the channel’s sorted set (ZSET) and increments the occurrence by one. Over time, this builds a sorted “high-score” of words for a particular channel.

A running tally of all words from a streamer’s chat, sorted by occurrence.

Now that we have the data, let’s add an API route to our Express server to respond with the 50 most frequent words from the chat.

The Rest API endpoint to retrieve the top 50 most frequently used words in chat.

This endpoint makes heavy use of Redis’ ZREVRANGE command which returns the top n scores from our sorted set. The WITHSCORES argument is used to include the scores in the response which we later use to calculate the relative size of a word when we display it. The route contains a :channel parameter which we use to select the right sorted set from Redis.

This is it for the server implementation; let’s focus on the client next. To keep things simple, I embedded a <script> tag into the HTML. Normally, we would serve is as a separate file 😅. You can see the full HTML in the sample repo.

Client

Our client will do one thing: generate a word cloud from the keywords we receive through the API. We’ll be using d3 and d3-cloud which offer a rich API for visualizations. Let’s get to it.

Hooking up our client to the back-end REST API.

The above is a function wrapping a request to our back-end REST API for the most frequent words for a particular channel with fetch. When we serve the embedded script with Express, we replace $CHANNEL_NAME with a pre-configured channel. This function returns an array of objects in the format of [{ text: 'word', size: 10 }, ... ] which we’ll use to bind to d3-clouds rendering function.

Bringing it all together: drawing pretty word-clouds.

Here, we’re setting up a 500x500 pixel canvas and bind our word cloud to it. The function mostly contains d3 specific rendering logic to arrange the words and calculate a word’s relative size. Of particular interest is the .fontSize() calculation:

We calculate the highestScore (largest, most frequent word), by finding the largest size in our data from the API. Then, we linearly interpolate each word’s relative size by dividing it by the highestScore and multiplying it with a maximum size (90 in the example). This assures that the font size of each word falls between 1 and 90 and that a word’s size matches its frequency.

For testing purposes, I ran the server against the BobRoss channel on Twitch for about 30 minutes while he was drawing a picture of a cabin in the woods.

Here is the result:

A word-cloud of Bob Ross painting a cabin in the woods. You can see some interesting trends emerge…

Conclusion

In this article I described a way to use Redis’ sorted set (ZSET) data type to record Twitch.tv chat messages in real time and generate word clouds from it.

The code is very basic and by no means perfect. While coding, I came up with a ton of features that could make it better, but tried to time-box my efforts to one hour.

Overall, this was a fun “can I do it in one hour project” that came out decent enough.

Some ideas I had for a version 2:

  • Let the client initiate the channel to monitor (currently embedded by the server)
  • Reset functionality to start from scratch.
  • Interactively re-bind the word cloud data (currently doing a reload)

Thank you for taking the time to read through my article. If you enjoyed it, please hit the Clap button a few times 👏! If this article was helpful to you, feel free to share it!

For more from me, be sure to follow me on Twitter, here on Medium, or check out my website!

References

--

--