Making a ‘Reviewer 2’ Twitter Bot

Bethany Baumann
Beth Blog

--

In the field of biology at least, any manuscript for publication in a scientific journal must pass the gauntlet of the three peer reviewers. Peer reviewer 1 has some good points and reasonable suggestions. Peer reviewer 3 wants some clarifications and thoughtful validation experiments. But, sandwiched between these two (perhaps as an attempt from the journal editors to soften the blow), peer reviewer 2 wants you to repeat everything, add on three more years of work, and, if you’re a student, give up any hope of impending graduation. In short, peer reviewer 2 is just the worst.

I decided to make a text-generating reviewer 2 twitter bot as a way to learn more about python and to hopefully entertain my labmates. Reviewer 2 Bot would be one part serious and one part sass, learning from real peer review examples and real googled joke examples.

Selected examples from Reviewer 2 Bot:

“I apologize in advance for this. FoldIndex score and the genomic code. Theorem in the authors thinking? Were Taxonomic Names on page 4. It may be not as bad as people say, you are much, much worse. I believe the importance of this protein.”

“You have diarrhea of the ideas. So…”

“I’d slap you, but that would be an idiot would be an insult to other scholars in turn, induce neurons to produce excess Alzheimer-related proteins H1 in stimulation of neuroinflammation with overexpression of cytokines.”

Tools used:

  • Requests
  • BeautifulSoup
  • Pickle
  • Twython
  • Crontab

Webscraping Peer Review Text

First, I used Requests and BeautifulSoup to collect peer review example text from F1000research.com. The code below will scrape the site and parse the texts from the paragraphs in the example reviews. Ultimately, I end up with a big block of all the example review text to feed to Reviewer 2 Bot.

import requests
from bs4 import BeautifulSoup
url = "https://f1000research.com/for-referees/peer-reviewing-tips/examples"def url_to_transcript(url):
page = requests.get(url).text
soup=BeautifulSoup(page, 'lxml')
return soup
soup = url_to_transcript(url)reviews = [i.find_all("p") for i in soup.find_all(class_="detail-report-text expanded-section")]reviews2 = []
for i in range(len(reviews)):
for j in range(len(reviews[i])):
reviews2.append(reviews[i][j].get_text())

reviews_together = ' '.join(reviews2)

I also copy and pasted jokes into a second string object. These included sarcastic insults to intelligence, science-specific jokes, and for good measure, a sprinkling of yo mommas.

The I combined the reviews with jokes. Did I say one part serious, one part sass before? Make that 50 parts sass.

all_together = reviews_together + jokes * 50

Training Bot to Make New Reviews

To train Reviewer 2 bot to generate NEW text based on input text, I took inspiration from onthelambda. A Markov chain was implemented using python dictionaries and python’s random module. The keys of the dictionaries are the first word of bigrams (two consecutive words) or first two words of trigrams (three consecutive words) appearing in the training text. The values for each key is a list of the last words from each bigram or trigram. Those words will populate the list according to their probability of occurrence after the proceeding words, so randomly selecting a word from the list, and moving from bigram to new bigram in the dictionary, will create a Markov chain.

In the code below I generate the dictionaries described above. I also made a list of all the keys in the dictionary of trigrams that represented the starts of sentences based on capitalization.

import string
from collections import defaultdict
all_together = all_together.split(" ")def generate_bigram(words):
if len(words) < 2:
return
for i in range(len(words) - 1):
yield (words[i], words[i+1])

def generate_trigram(words):
if len(words) < 3:
return
for i in range(len(words) - 2):
yield (words[i], words[i+1], words[i+2])
from collections import defaultdict
chainBI = defaultdict(list)
chainTRI = defaultdict(list)
for word1, word2 in generate_bigram(all_together):
if "." not in word1:
chainBI[word1].append(word2)
for word1, word2, word3, in generate_trigram(all_together):
if "." not in word1 and "." not in word2:
chainTRI[(word1, word2)].append(word3)
starts = [i for i in list(chainTRI.keys()) if i[0][0][0] in list(string.ascii_uppercase)]

Then, to save these python dictionaries for later use I preserved them by pickling them. I hope no one gets botulism.

import picklewith open('reviewer_chainTri.pickle', 'wb') as write_file:
pickle.dump(chain, write_file)
with open('reviewer_chainBi.pickle', 'wb') as write_file:
pickle.dump(chainBI, write_file)

with open('reviewer_starts.pickle', 'wb') as write_file:
pickle.dump(starts, write_file)

This next part was the hardest. I had to create a function that runs the Markov chain, producing new text from my dictionaries. It has to be just the right amount of shuffling to produce text that still sounds like a review, but isn’t just input text regurigated.

I tried to achieve this balance by having the chain randomly switch 1/3 of the time from the trigram dictionary to the bigram dictionary, and by randomly removing all punctuation and capitalization from words 1/2 of the time.

The final bit of the code ensures that the output is the right number of characters for twitter (< 280), by reducing the new review back to either the last period or comma if it’s too long.

def reviewer2(sentences=5):
new_review = []
s = 0
word1, word2 = random.choice(starts)
new_review.append(re.sub("[()\";]","", word1))
new_review.append(re.sub("[()\";]","", word2))
while s < sentences:
try:
word3 = random.choice(chainTRI[(word1, word2)])
except:
word1, word2 = random.choice(starts)
new_review.append(re.sub("[()\";]","", word1))
new_review.append(re.sub("[()\";]","", word2))
if "." in word1 or "." in word2:
s += 1
continue
if "." in word3:
s += 1
new_review.append(re.sub("[()\";]","", word3))
word1 = word2
word2 = word3

if random.randint(0,1) == 1:
word1=word1.lower().strip()
word2=word2.lower().strip()

if "." not in word2 and random.randint(0,3) == 1:
try:
word3 = random.choice(chainBI[word2])
except:
word2 = random.choice(list(chainBI.keys()))
word3 = random.choice(chainBI[word2])
if "." in word3:
s += 1
new_review.append(re.sub("[()\";]","", word3))
word1=word2
word2=word3


if len(' '.join(new_review)) >= 280:
new_review = ' '.join(new_review)
if random.randint(0,1) == 1:
new_review = re.sub("\,.*$", "...", new_review)
else:
new_review = re.sub("\..*$", ".", new_review)
new_review = new_review.split()
break
return ' '.join(new_review)

Now Reviewer 2 Bot can talk, and I have created a monster! Here is a gem: “What spaces, online or offline, do they interact in, and do they interact in, and do they share knowledge, for a precise mechanistic study of a hamster and your father smelt of elderberries.” Good combination of dry scientific text and Monty Python.

Tweeting Automatically

The final step is getting Reviewer 2 Bot to tweet automatically. I created a twitter account for my bot (Reviewer215 … please follow) and applied for that account to be a developer account. They ask some questions about the intent of the apps created by the account, but I literally wrote “a bot that will automatically post ridiculous fake scientific manuscript reviews”, and I was approved.

Once the account is a developer account twitter gives you an API key and a secret API key. You’ll also need to generate an Access Token and Secret Access Token to be able to post without logging in. You can find and regenerate all these keys and tokens when you login to your account at the developer portal and go to your dashboard then click on the key icon next to your app:

Once you have your codes and feel like a cool secret agent, you can use the python module Twython to easily post to your account by adding the following code to a python script (important: this is below importing the pickle dictionary objects and defining the reviewer2 function as above). The if __name__ == “__main__” section makes the reviewer2 function run each type the script is run.

from twython import Twythonif __name__ == "__main__":
review = reviewer2()
APP_KEY = YOUR API KEY HERE
APP_SECRET = YOUR API SECRET KEY HERE
OAUTH_TOKEN = YOUR AUTHORIZATION TOKEN HERE
OAUTH_TOKEN_SECRET = YOUR AUTHORIZATION SECRET TOKEN HERE
twitter = Twython(APP_KEY, APP_SECRET, OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

twitter.update_status(status=review)

Now if I run my python script, called for example reviewer2_tweets.py, it’ll automatically post a terribly unfunny tweet to Reviewer215 account. But, wouldn’t it be better if I could just have Reviewer 2 tweet by himself without me having to physically do anything? I’ve already done enough busy work for Reviewer 2 in my life.

To run the bot automatically, create a shell script, called for example Reviewer2.sh, using vi from your command line. It should look something like this:

#!/bin/bashPATH=/opt/anaconda3/bin:/opt/anaconda3/lib/python3.7/site-packages:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/Users/beth/anaconda3:/opt/anaconda3python3 /Users/beth/Documents/Python/reviewer2_tweets.py 

Once you have the shell script in hand, you can make your mac run it automatically with Crontab. Crontab runs from above the “User” level, so even if you have your PATH set in your bash profile, it might not be able to find your python modules.

Clearly I struggled to find the path that included all my conda python modules and I threw in every path I could think of. I left that mess here as examples of some places to look. Then I have a line running my python script with python3.

You can access Crontab by typing “crontab -e” in the terminal. There might be nothing there at first. Each line added is a Crontab job, telling Crontab when to run a specific script.

30 */4 * * * /Users/beth/Documents/Python/Reviewer2.sh

The second part of the line is path to your shell script. The first part is five asterisks separated by spaces that represent minute, hour, day (month), year, and day (week). If you replace an asterisk with a number X, the job will run when that unit of time = X. In the above example, when minutes = 30. If you add a forward slash and a number Y after an asterisk, the job will run every Y units of time. In this example, every 4 hours. So overall, it runs every four hours on the 30 minute mark (important: your computer has to be up and running at the given time for Crontab to actually run). This adds some randomization to the timing of the tweets. See crontab guru to test out crontab job symbols.

Conclusion

I learned a lot from making Reviewer 2 Bot and my PI said that it was funny, so that’s a win!

--

--