A GPT-2 WhatsApp Web bot to reply in your stead

Published in

Voice Tech Podcast

11 min readMay 9, 2020

This project is the result of my curiosity for OpenAI’s GPT-2 project. While talking about this model with friends over a WhatsApp group chat, someone joked whether it were really me in person replying on the group or a GPT-2 model bot.

The idea stuck and after looking around for a similar work on the web, I couldn’t find one that allowed to do it, i.e. read messages from a WhatsApp group chat, take a message as context and reply whatever the text is generated by the GPT-2 model.

In this article I will take you through the steps it took me to build this bot and show you how, you too can build one yourself. So here goes nothing..

The work here will be divided in two parts:

Part 1 explores the setting up of GPT-2’s 345M model and
In part 2, we set up a bot and let it react if triggered by a keyword.

Before diving into the details, some background: I am not a programmer by profession. This project was built while I was in a partial unemployment phase of the Covid19 lockdown here in France.

My home-machine specs are as below:
- Intel(R) Core(TM) i5–6600 CPU @ 3.30GHz
- 16.0 GB of RAM
- Windows 10 Home
- PyCharm Community Edition
- Python version 3.7.6
- Google Colab environment to train the GPT-2 model because it was too slow on my home-machine with no GPU (I know, it’s preposterous!).

Also, I have found Python to be unforgiving when it comes to paths, so I suggest you take special care when setting up your directory structure as follows:

workspace workspace/gpt-2/ workspace/whatsapp_bot/

Part 1: Setting up GPT-2

Let me start by saying that this part of the article is less my own work and more inspired by others’ I found on the web. Specifically, this article by Ng Wai Foong which really helped me get it all up and running from the first go. In the spirit of not reinventing the wheel and all that, I encourage readers to go through the article cited above and set it all up. Once it’s done, come back here and follow the rest of the steps here on.

Now that you’ve played with the GPT-2 model in the guidance of Ng Wai Foong’s work, let’s continue with our project and train the model with our data. For that, we’ll need to create a function that takes a text string as context, run it through the model (as we provide context to interactive_conditional_samples.py) and return the model-generated text back to us. It’s easier if we create this file directly in the gpt-2/src/ directory, and we’ll call this file “auto_reply_msg.py”. So let’s dive in:

#!/usr/bin/env python3

import fire
import json
import os
import numpy as np
import tensorflow as tf

import model, sample, encoder

def interact_model(
    message="A quick look at the",
    model_name='345M',
    seed=5,
    nsamples=1,
    batch_size=1,
    length=50,
    temperature=0.9,
    top_k=20,
    top_p=0.9,
):
    if batch_size is None:
        batch_size = 1
    assert nsamples % batch_size == 0

    enc = encoder.get_encoder(model_name)
    hparams = model.default_hparams()
    path = os.path.dirname(__file__)
    with open(os.path.join(path, 'models',
                           model_name,
                           'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
        length = hparams.n_ctx // 2
    elif length > hparams.n_ctx:
        raise ValueError("Can't get samples longer"
                         " than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)
        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p
        )

        saver = tf.train.Saver()
        ckpt = tf.train.latest_checkpoint(os.path.join(path, 'models',
                                                       model_name))
        saver.restore(sess, ckpt)

        if message == "":
            return -1
        raw_text = message

        context_tokens = enc.encode(raw_text)
        out = sess.run(output, feed_dict={
            context: [context_tokens for _ in range(batch_size)]
        })[:, len(context_tokens):]
        text = []
        for i in range(batch_size):
            text.append(enc.decode(out[i]))

        return text


if __name__ == '__main__':
    print(fire.Fire(interact_model))

I won’t explain the code above as I only did some marginal changes on it so that it applies to our project. Source code for this file is also available at auto_reply_msg.py. You can test your code with:

python auto_reply_msg.py --message “An apple fell from the tree and ”

and the model should get you some output text. Remember though that the text shall still be not formatted to pass for a message at this stage. We shall work on reformatting the output text into a message in the next part when we build our bot. For now, just cherish the fact that you have successfully set up GPT-2’s 345M model.

[Note] I personally didn’t use the pre-trained 345M for my project. I actually retrained (also called fine-tuning) the model with my own data, exported from my WhatsApp chats. It’s pretty straightforward to do, should you want to go that route to give the bot a bit of your personality.

Part 2: Building the bot

Again, I took inspiration from many people’s earlier works. I have no source article/tutorial to plug here though as I built the bot myself.

So let’s think for a moment how we want our bot to be and behave:

It opens a new browser window
It goes to WhatsApp Web page
It waits for us to login to WhatsApp Web page with our mobile device
It opens the chat/group chat window
It reads the incoming messages
It is triggered by some keyword we define
It takes the message with the keyword, run the model and get a reply
It writes the AI reply to the chat window

To achieve all of this, I started with Selenium with Python. It allows you to manipulate browsers directly from Python and while it’s designed to be used for automated testing, it’s the perfect tool for our project.

We need to start by installing Selenium: pip install selenium should do it. Next, you need to download the chrome driver, since we are going to be working with Chrome browser. Make sure to place the downloaded “chromedriver.exe” file in your whatsapp_bot/ directory.

I will provide you the source code for the whole file later, but let’s get the basics right first. Following snippet of code launches the Chrome browser and goes to url: “https://web.whatsapp.com/”

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://web.whatsapp.com")

When you run this piece of code, an instance of Chrome browser is launched, a WhatsApp Web page opens up and asks you to scan the QR code from your mobile device. It’s an inconvenience we are going to live with, but later on I will show you how to reduce it somewhat.

Once you have manually “logged in” with your mobile device QR code, we move onto the next step i.e., open the chat/group window.

group = “Example group”
elem = driver.find_element_by_xpath(
    '//span[contains(@title, "{}")]'.format(args.group))
elem.click()

The above code goes through the web page, looks for the title of the group in the pan on the left and emulates a mouse click over it, thus opening up the chat window on the right side.

Next we want to read the messages from the window that’s been opened. Remember, WhatsApp only loads a handful of messages if you don’t scroll up in the chat window, which is good for us as we don’t really want to read old messages in the chat window.

elems = driver.find_elements_by_class_name("Tkt2p")
for elem in elems:
    msg = elem.find_element_by_class_name("_3zb-j")
    tim = elem.find_element_by_class_name("_2f-RV")

The snippet above goes through the opened chat window and gets you all the messages and their corresponding timestamps. Chats can contain media (pics, gifs, audio and video), but none of it is required for our project so I only go after first hand text messages (bah-bye reply quotes).

import time
from selenium.webdriver.common.keys import Keysmsg = "Any-message-you-want-to-send-to-the-whatsapp-chat"
inp_xpath = '//div[@class="_2S1VP copyable-text selectable-text"][@contenteditable="true"][@data-tab="1"]'

input_box = driver.find_element_by_xpath(inp_xpath)
time.sleep(2)input_box.send_keys(msg + Keys.ENTER)
time.sleep(2)

The above code goes through the WhatsApp Web page on your Chrome browser, looks for the path defined in [inp_xpath](input box) and write the [msg] in it before emulating (hitting) the carriage return key press.

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

Now that we have the basic ins and outs, check out the full code below:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import pandas as pd

import os, sys
from pathlib import Pathgpt_folder_name = "gpt-2"
gpt_path = os.path.join(Path.cwd().parent, "{}/src".format(gpt_folder_name))
sys.path.insert(1, gpt_path)
from auto_reply_msg import interact_model as reply

import logging
logging.basicConfig(format='[%(asctime)s \t%(filename)s \t %(funcName)s] -\t%(message)s', level=logging.INFO)

import time
import argparse

parser = argparse.ArgumentParser(description="Fully functional WhatsApp Web bot with AI text generation.",
                                 formatter_class=argparse.ArgumentDefaultsHelpFormatter)

# Arguments
parser.add_argument("--url", type=str, default="https://web.whatsapp.com/", help="WhatsApp Web URL")
parser.add_argument("--group", type=str, default="xxx", help="Name of your WhatsApp 'chat' for the bot to attach to")
parser.add_argument("--identifier", type=str, default="@xxx", help="Identifier found in a message triggers a reply")
parser.add_argument("--periodicity", type=int, default=5, help="Amount of time (in secs) the program waits to go check for new messages")
parser.add_argument("--cred_file", type=str, default="cred.txt", help="Name of the file containing browser credentials")

args = parser.parse_args()


def launchChrome(remote=False,
                 executor_url=None,
                 session_id=None):
    '''
    Launch a new Chrome browser window to get driver object, then
    either switch it to existing window or return new window handle
    '''
    if remote:
        logging.info("Trying to resumes an existing browser session")
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        driver = webdriver.Remote(
            command_executor=executor_url,
            desired_capabilities=chrome_options.to_capabilities())
        driver.session_id = session_id
    else:
        logging.info("Opening new browser window")
        driver = webdriver.Chrome()

    return driver


def get_new_credentials():
    '''
    Get new credentials from newly opened file and write it in cred.txt
    This allows to avoid opening a WhatsApp every time in a new instance of
    if Chrome browser if a previous browser session is already open.
    '''

    driver = launchChrome()
    logging.info("New browser window opened")
    executor_url = driver.command_executor._url
    session_id = driver.session_id
    logging.info("Setting new credentials in 'cred.txt' file")
    with open(args.cred_file, "w") as f:
        f.write("session_id {}\n".format(session_id))
        f.write("executor_url {}".format(executor_url))

    return (driver)


def get_driver():
    '''
    Get driver, either from an existing Chrome window or a new one
    '''
    try:
        logging.info("Trying to open cred.txt file to fetch existing credentials")
        with open(args.cred_file, "r") as f:
            lines = f.readlines()
        for line in lines:
            if "session_id" in line:
                session_id = line.split()[1]
            if "executor_url" in line:
                executor_url = line.split()[1]
        logging.info("Trying to resume an existing browser session from fetched credentials")
        driver = launchChrome(remote=True, session_id=session_id, executor_url=executor_url)
        logging.info(driver.current_url)
    except:
        logging.info("Didn't work, opening a new browser window and getting new credentials")
        driver = get_new_credentials()

    return (driver)


def read_msgs(data):
    '''
    Go through the WhatsApp Web page in the browser,
    select the chat, read messages from the chat,
    and finally add messages to the [data] with isreplied set to "False"
    '''

    logging.info("Let's find the 'Group' on the web page")
    time.sleep(1)
    try:
        elem = driver.find_element_by_xpath(
            '//span[contains(@title, "{}")]'.format(args.group))
        elem.click()
    except:
        logging.info("Cannot find the Group. Try again")
        return -1

    logging.info("Read messages from the 'group'")
    time.sleep(1)

    elems = driver.find_elements_by_class_name("Tkt2p")
    for elem in elems:
        msg = elem.find_element_by_class_name("_3zb-j")
        tim = elem.find_element_by_class_name("_2f-RV")
        # Only append message if identifier found in the msg, discard otherwise
        if args.identifier in msg.text:
            logging.info("Identifier found. Adding entry to DataFrame")
            data.loc[len(data)] = [pd.to_datetime(tim.text),
                                   msg.text.replace(args.identifier, ""), False]
            # Drop duplicates from the DataFrame
            data = data.drop_duplicates(subset=["time", "message"])

    return data


def get_reply_msg(text):
    '''
    Get auto generated context message from AI model
    '''
    logging.info("Input message: {}".format(text))
    raw_msg = reply(message=text)
    logging.info("Raw output message: {}".format(raw_msg))

    formatted_msg = "not-mirani-bot: {}".format(
        raw_msg[0].split("\n\n")[1].strip())

    logging.info("Formatted output message: {}".format(formatted_msg))

    return formatted_msg


def reply_msg(text):
    '''
    Call the AI model with the input text string [text]
    Put the [msg] in the chat input box and hit enter
    '''
    msg = get_reply_msg(text)

    time.sleep(2)
    logging.info("Selecting the input box")
    inp_xpath = '//div[@class="_2S1VP copyable-text selectable-text"][@contenteditable="true"][@data-tab="1"]'

    # Go forward only if input box is found on the web page. Else return False
    try:
        input_box = driver.find_element_by_xpath(inp_xpath)
    except:
        logging.info("Unable to find the input box the target group")
        return False
    time.sleep(2)
    logging.info("Sending message: {}".format(msg))

    # Return True if the message is sent to the group. Else return False
    try:
        input_box.send_keys(msg + Keys.ENTER)
        time.sleep(2)
        return True
    except:
        logging.info("Unable to send message to the target group")
        return False


if __name__ == "__main__":
    '''
    Main function to handle the program flow
    Get [driver] element to manipulate browser and call WhatsApp Web URL
    Create a new DataFrame called [data] with with three columns:
      -> time
      -> message
      -> isreplied    
    Start an infinite loop to monitor incoming messages on the chat window
    '''

    start_time = pd.datetime.now()
    # Get Chrome browser driver, either existing or a new one
    driver = get_driver()

    # Only get "https://web.whatsapp.com" if it's not already set in the existing browser
    if driver.current_url != args.url:
        logging.info("Current URL not 'target', calling 'target' URL")
        driver.get(args.url)
        input("Scan QR Code, and then hit Carriage Return >>")
        print("Logged In")

    # Creating a dummy pandas DataFrame to hold messages
    data = pd.DataFrame(columns=["time", "message", "isreplied"])

    # Create an infinite loop to check web page
    # for new messages periodically
    while (True):
        # Read messages and fill the messages in the pandas DataFrame
        data = read_msgs(data)
        # Go through the dataframe and
        ## if msg is received before start_time, don't reply
        ## if msg is already replied, don't reply
        ## if msg is received after start_time and not replied yet, reply!
        for i, msg in enumerate(data["message"]):
            logging.info("Message: '{}'".format(msg))
            if data["time"][i] < start_time:
                logging.info("Msg was received before the bot started")
                data["isreplied"][i] = True
            else:
                logging.info("Msg was received after the bot started")
                if not data["isreplied"][i]:
                    logging.info("Msg is not yet replied")
                    isreplied = reply_msg(msg)
                    data["isreplied"][i] = isreplied
                    if not isreplied:
                        logging.info("Something wrong with the reply_msg() function. Exiting..")
                        break
                else:
                    logging.info("Msg is already replied")
        time.sleep(args.periodicity)

Source code for the above file can be downloaded directly from wa_bot.py. To execute it, go ahead first and set default parameters for the argument parser at the top of the code for following fields so that you don’t need to pass command line arguments every time you execute it:
- group
- identifier
where “group” has to be the name of the chat you want the bot to attach to and “identifier” is a combination of characters which when found in a message triggers the bot’s reaction. Let’s say you set the “identifier” to “@not-mirani-bot”. Now anyone (yourself included) can send a message like:

@not-mirani-bot I had a cobbler pie last night

and the bot will react to this message.

Before you can execute it though, you need to iron out some details. Since gpt-2/ and whatsapp_bot/ are two separate folders, calling gpt-2/ packages from whatsapp_bot/ don’t work out of the box. So let’s take care of that now.

Open the file gpt-2/src/encoder.py and find “def get_encoder(model_name)” towards the end of the file. Once there, change the contents of the function as such: (modifications are marked in bold)

def get_encoder(model_name):
    path = os.path.dirname(__file__)

    with open(os.path.join(path, 'models', model_name, 'encoder.json'), 'r') as f:
        encoder = json.load(f)
    with open(os.path.join(path, 'models', model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
        bpe_data = f.read()
    bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split('\n')[1:-1]]
    return Encoder(
        encoder=encoder,
        bpe_merges=bpe_merges,
    )

That’s all. We are almost at the end of our journey here. With all of this in place, all you need to do is to execute the whatsapp_bot/wa_bot.py file as such:

python wa_bot.py

You should see a new Chome browser firing up, connecting to https://web.whatsapp.com and asking you to login with the help of QR code on your mobile device. Once you do so, hit ENTER on the command prompt to go forward with the execution.

From there on, the program should open the group/chat you requested and monitor all incoming messages on it. Once it finds the predefined trigger sequence in one of the messages, it takes the message to the GPT-2 model, gets a reaction from it, writes the reaction in the chat input box and sends it.

Below are some of the examples of the interactions I got from chatting with my WhatsApp bot:

Trigger -> @not-mirani-bot Do you know about the Indus valley civilization?
Reaction -> not-mirani-bot: There is a large Chinese-Indo-Gulf (India) archaeological site under construction in the Himalayas. Indian researchers have recently discovered hundreds of human remainsTrigger -> @not-mirani-bot How was the lunch yesterday with your boss?
Reaction -> not-mirani-bot: We ate right before he got home from work so I can’t give you a specific date, but I can say it was nice. There was some chicken and waffles on the side with an apple on top

A GPT-2 WhatsApp Web bot to reply in your stead

Part 1: Setting up GPT-2

Part 2: Building the bot

Something just for you

Written by Farhan Mirani