Create a Twitter Politician Bot with Markov Chains, Node.js and StdLib
In the world’s current political climate, propaganda is the name of the game and Twitter is the medium of choice. Automation is king, and if you’re not using Twitter bots to sway the masses, you’re doing it wrong. Here at StdLib, we don’t really have any political motivations, but we sure do enjoy building bots. And, with the launch of StdLib Sourcecode, it’s never been easier for us to share our newest project with you: introducing Jaden Trudeau, the eccentric future Prime Minister of Canada. We’ll teach you all about how we built this wonder of modern engineering, and how you can build your own “Political Terminator” (shoutout to Arnie) in minutes.
With the goal of building a Twitter bot build to appeal to the masses, we chose to combine the wisdom of Jaden Smith:
With the wholesomeness of Justin Trudeau:
To create the world’s perfect politician: Jaden Trudeau. More specifically, the goal is to create a bot that occasionally tweets procedurally generated sentences in the style of Jaden Smith and Justin Trudeau. This combination results in wonderful specimens such as:
The tool of choice is for this project is a Markov chain: Markov chains have many real world applications like Google’s Page Rank algorithm, but none are as important as this one. If you want to skip to the working version of the code, you can checkout its API page here. From this page you can try the service yourself, and even mix in other peoples Twitters!
What’s the deal with Markov Chains?
We describe a Markov chain as follows: We have a set of states, S = {s_₁, s_₂,…,s_r}. The process starts in one of these states and moves successively from one state to another. Each move is called a step. If the chain is currently in state s_i, then it moves to state s_j at the next step with a probability denoted by p_ij, and this probability does not depend upon which states the chain was in before the current. [source]
In short, a Markov chain is a mathematical model that transitions from one state to another by throwing out the history of previous states and only examining the present. While that explanation is still bit abstract, it becomes more clear within the context of generating sentences. Below is an outline for how you might generate text using a Markov chain.
- Split a body of text (your corpus) into tokens (words and punctuation).
- Build a frequency table. This data structure has a key for every unique token in your corpus. This key is mapped to a list of all the words that follow the key, along with the frequency at which it occurs after that word. It also helps to add special keys for the start and end of sentences. This ensures that when sampling from the model you can always start and end sentences with appropriate words.
- Select a starting point (one of those special start words) and then randomly select a token from the list of tokens that follow the key. The probability that a key is chosen should be proportional to how often it appears after the key. This new token is now the state of the Markov chain. Lookup the new token in the frequency table and repeat.
Implementation
With a general idea of how to proceed, it’s time to get going. First things first, we need to fetch some tweets. With Twit, thats no problem.
After receiving the tweets, they need to be tokenized. With tweets, this is not an entirely trivial process. Tweets are full of URLs, emojis and ill formed sentences. We can turn a string representing a tweet into an array of tokens with the code below:
This function takes in a tweet, strips it of URLs and mentions and splits it into words. These arrays can then be feed into the frequency table.
The code to generate the table is a little long for a medium post, but you can see it here. After the table is generated, entries look like this:
These entries could be traversed in a few ways. At the beginning, there is a 50/50 change of selecting ‘our’ or ‘we’ as the starting word. Assuming ‘our’ gets chosen then there is a 2/5 chance that ‘future’ or ‘differences’ gets chosen and a 1/5 chance for ‘relationship’. This process keeps repeating until a chain is created such as:
__START -> our -> future -> office -> __END
And that’s pretty much it. If you want to see it in action, follow Jaden Trudeau on Twitter. Of cource, there are many tweaks that can be made. For instance, if you want to generate multiple sentences at a time you can add edges from __END
to __START
, and just make sure that you end with a complete sentence.
Building Your Own
If you’d like to build your own Twitter bot, you can find the template for the bot on Code on Standard Library. If you click that link, the template will automatically open. If not, navigate to the “Community API Sources” tab and search for “steve/twitter-bot”. Once loaded, open the env.json
file and you’ll see four variables.
The variables can be found on your Twitter application management page. Click “Create an App” and fill out the form:
After you click “Create”, you’ll find the four keys on the next page. Copy them into their respective places in the env.json
. Click the green “Run” button in the bottom right corner of the screen (or press cmd/ctrl + r). This will deploy and execute your code, right from the browser.
By default this bot uses the Markov chain to generate tweets. If you wanted to swap that out for another method, you could open functions/__main__.js
and make a small change:
On line 9there is a call to lib.steve[twitter-markov-chain]
, which the Markov chain from earlier. You can play with it directly from the StdLib library docs page. You can create your own function that generates tweets and simply swap it in. Now click run to redeploy your bot.
And that’s it, thanks for reading! Hopefully you were able to learn a little bit about Markov chains, Twitter bots and StdLib. Building a propaganda machine is just one of the many ways you can get started with StdLib. If you have a neat idea you’d like to share, reach out to me directly by e-mail: steven@stdlib.com, or follow me and the StdLib team on Twitter.
As always, we look forward to hearing from you and happy building!
Steve Meyer is a recent graduate of Oberlin College and Software Engineer at StdLib. When he’s not programming you can find him cooking, baking, or playing Breath of the Wild.