Combining WhatsApp with large language models: prototyping with Twilio and Flask

Published in

Discovery at Nesta

9 min readAug 15, 2023

Nesta’s Discovery Hub has launched a project to investigate how generative AI can be used for social good. We’re now exploring the potential of large language models (LLMs) for early-years education, and in a series of Medium blogs we discuss the technical aspects of our early prototypes.

In our previous article, we created a web app using Streamlit that allows caregivers and early-years educators to generate personalised activity ideas for toddlers. Now, we explore the idea that a widely-adopted messaging platform like WhatsApp could be a more accessible user interface for caregivers or educators compared to a web app.

As such, we’ve now built a simple WhatsApp chatbot prototype, which uses a LLM. In this blog, we detail our first steps in creating this prototype and I hope this will also help you to get started with your own WhatsApp bot!

But before we go into details, let’s see it in action.

Demo of a simple WhatsApp chatbot prototype using a large language model to answer questions and generate activity ideas

If you would like to give us some feedback, please get in touch. What follows here is a technical overview of how we made this prototype.

Using Twilio, Flask and OpenAI to build a WhatsApp chatbot

To create this prototype, we used Twilio, a communications service platform, to manage a WhatsApp business account. We made use of the Twilio Sandbox for WhatsApp, which allowed us to prototype immediately without waiting for the account to be approved by WhatsApp.

The Twilio account can receive messages from WhatsApp users and forward them to our custom-made app, using the app’s application programming interface (API), which we created with the Flask Python package.

Our Flask app parses the user message, selects the right prompt from a set of pre-made tailored prompts, depending on the content of the incoming message, and sends the prompt to OpenAI’s LLM. Once the LLM response is generated, the Flask app sends it back to WhatsApp via Twilio and the user receives a message on their phone.

A flowchart with boxes and arrows, starting from “WhatsApp message” and continuing to “Send message to server” (via Twilio) to “Select the right prompt” (via Flask app) to “Selected prompt” to “LLM call” (via OpenAI) to “Send reply to WhatsApp” (via Flask and Twilio) to “Result” — *Overview of the WhatsApp chatbot prototype*

At the moment, the bot can deal with two types of user messages. The user can start the interaction by saying anything generic such as “Hi”– this will bring up instructions explaining the two types of messages.

Screenshot of WhatsApp app screen, with the user typing “Hi” and the chatbot answering with instructions saying “Write ‘Explain <your question> to explain a concept to a 3-year old or ‘Activities <your topic> to get activity ideas”

The ‘Explain’ message triggers what we call the ‘Explain like I’m 3’ prompt, which you can read below. It defines the role of the LLM as a helpful and kind educator, who should use simple words, never offend or be aggressive. Note, however, that without additional guardrails for content moderation, the adherence of the LLM to these instructions is not fully guaranteed.

Before sending the prompt to OpenAI API, the {input} field in the prompt is replaced by the user question.

###Instructions###
You are a helpful, kind, intelligent and polite early-years educator. 
Your task is to explain a concept to a 3 year old child. 
You must explain it in simple words that a young kid would understand. 
You must also be patient and never offend or be aggressive. 
Gendered language and any adjectives about the kid are strictly prohibited.

###Question###
{input}

###Answer###

Screenshot of a Whatsapp app screen, showing an example of user asking “Explain why is the sky blue” and the chatbot answering “Oh, that’s a great question! The sky is blue because of something called sunlight. You see, the sun gives off light, and that light is made up of different colors. When the sunlight reaches our Earth, it goes through the air. The air is made up of tiny little things called molecules. These molecules are very good at scattering the sunlight…” (and so on) — *Example of an LLM-generated response to a question of “why is the sky blue” in a WhatsApp chat*

The second, ‘Activities’ message triggers our personalised activity generation prompt, which we described in greater detail in our previous blog. This prompt is quite long, as it includes information about the different areas of learning defined in England’s early-years foundation stage guidance. The approximately 20 seconds that it takes for the large language model to generate the response is, in fact, longer than Twilio’s maximal timeout (15 seconds) for waiting for a reply. Hence the bot first replies with a generic hold message and then sends the actual response as another message once it’s ready.

Screenshot of a Whatsapp app screen, showing an example of the user asking “Activities about the solar system” and the chatbot answering with ideas for conversations such as “Planet talk” and “Planet Facts” — *Example of LLM-generated activity ideas in a WhatsApp chat*

The response for this particular prompt was too long for a single message, as WhatsApp’s limit is 1,600 characters. Therefore, we split the message up into chunks and send them in multiple messages.

An alternative approach could be to adjust the prompt to generate shorter responses, which would also bring down the response time.

Screenshot of a Whatsapp app screen, showing an two consecutive messages sent by the chatbot. — *Chunking the long response from LLM as separate messages*

This is how the chatbot works, and now we’ll peer behind the scenes at the Flask API code and show a couple of options for deploying the bot and connecting it with Twilio.

Behind the scenes: Flask API

To power this chatbot prototype, we wrote a simple Flask app that has a /text API endpoint and a few helper functions. All the code is available on our GitHub repository, so we highlight just a few main points below.

You will need to install and import the Twilio Python package and set up authorisation tokens associated with your account.

import os
from dotenv import load_dotenv
from twilio.rest import Client
from twilio.twiml.messaging_response import MessagingResponse

# Twilio settings
load_dotenv()
client = Client(
    os.environ["TWILIO_ACCOUNT_SID"], 
    os.environ["TWILIO_AUTH_TOKEN"]
)

Once that is done, you can receive and send messages with your Flask app. To reply to incoming messages, we can use Twilio’s Messagingclass directly in the /textendpoint.

@app.route("/text", methods=["POST"])
def text_reply() -> str:
    """Respond to incoming messages"""
    reply = generate_reply(
        incoming_message=request.form.get("Body"),
        sender_contact=request.form.get("From"),
        receiver_contact=request.form.get("To"),
    )
    resp = MessagingResponse()
    resp.message(reply)
    return str(resp)

The endpoint calls a generate_reply() helper function that parses the incoming message, decides which prompt to use and calls OpenAI API. This is implemented as a simple series of if-then statements that check for the beginning of the incoming message’s string.

For a better user experience, one could implement a more flexible way of detecting the user’s intent — for example, by using an OpenAI function to pick the right prompt.

def generate_reply(incoming_message: str, sender_contact: str, receiver_contact: str) -> str:
    """Parse message text and return an appropriate response. 
  [...]
  """
  text_message = incoming_message.lower()
  # 'explain' response
  if text_message[0:7] == "explain":
  response = ActivityGenerator.generate(
              model=LLM,
              temperature=TEMPERATURE,
              messages=[ELI3_MESSAGES.copy()],
              message_kwargs={"input": text_message[7:].strip()},
          )
          return response["choices"][0]["message"]["content"]
  [...]

The activities prompt, as explained above, takes a while and hence the app starts a new Python thread, which will execute the send_text() function, and in parallel return a hold message indicating that the response is coming shortly.

  [...] 
  # 'activities' response
  elif "activities" in text_message[0:10]:
      EYFS_PARAMETERS["description"] = text_message
      thread = Thread(
            target=send_text, args=[copy.deepcopy(EYFS_MESSAGES), EYFS_PARAMETERS, receiver_contact, sender_contact])
     thread.start()
     return "Thank you for your question. I am thinking..."
  [...]

In its present implementation, the approach of using threads should be seen as a quick hack for prototyping purposes, as this might result in a dangling thread that isn’t closed after sending the reply. A better but more complex solution would be to use a task queue (eg, Celery) to handle the long-running task.

The send_text() function generates the response using OpenAI API, chunks it into shorter strings and sends them as a series of messages to the WhatsApp user using Twilio’s Client class.

def send_text(messages: List[Dict], message_kwargs: Dict, my_contact: str, receiver_contact: str) -> None:
    """Generate text messages and send them to a given contact
    [...]
    """
    # Generate response to the message
    response = ActivityGenerator.generate(
        model=LLM,
        temperature=TEMPERATURE,
        messages=messages,
        message_kwargs=message_kwargs,
    )
    text_body = response["choices"][0]["message"]["content"]
    # Format the text_body for better display on WhatsApp
    text_body = format_activities_text(text_body)
    # Divide output into 1500 character chunks
    texts = [text_body[i : i + 1500] for i in range(0, len(text_body), 1500)]
    # Send message
    for text in texts:
        client.messages.create(body=text, from_=my_contact, to=receiver_contact)
        sleep(0.5)
    return

For more information and advice about using the Twilio Python package to communicate with WhatsApp, consult their documentation.

Running the API locally with ngrok

Now that you have your Flask API, we need to connect it to Twilio and test it. First, start your app (note that we’re using Poetry for managing our Python environment)

poetry run python whatsapp_bot.py

You can then expose the locally-running app to the internet using ngrok. Ngrok is a service that provides you with a URL that you can use to connect to your app, and it has a free tier for developers to test their projects.

Assuming your Flask app is using port 5000, you can start ngrok as follows

ngrok http 5000

In Twilio you’ll need to set up a Sandbox for WhatsApp and add the URL created by ngrok under “Sandbox settings” in the box “When the message comes in”. Don’t forget to add the app’s endpoint /text to the URL.

Configuring Twilio’s Sanbox for WhatsApp

You and your test users can now go to WhatsApp and send a message to the number provided by the Twilio Sandbox. Note that the first message to the chatbot will need to be a special code provided by Twilio to connect with the sandbox — and then you can start chatting with your bot.

For more details about Twilio, you can look at their documentation as well as consult online tutorials.

Deploying the API on Heroku

Testing your app locally is great, but what if you close your machine? To keep your WhatsApp chatbot running even when you rest, you can deploy it on Heroku using Docker.

Heroku is a platform-as-a-service that enables developers to run applications in the cloud, and they provide affordable offers for testing and running apps that see intermittent use.

Getting started with Heroku and Docker can be somewhat tricky, so for illustration purposes, we walk through our main steps for launching the app. However, they might look differently for your implementation, depending on how you have organised your project and which deployment approach you take.

First, make sure you’ve set up Heroku on your machine and log in the container registry.

heroku container:login

Then you can create a new app. This will create an app with a random name — the name can be changed later.

heroku create

Set up your environment variables, so that your app can interact with OpenAI API and Twilio even when deployed on the cloud.

heroku config:set OPENAI_API_KEY=<your_api_key>
heroku config:set TWILIO_ACCOUNT_SID=<your_account_sid>
heroku config:set TWILIO_AUTH_TOKEN=<your_token>

You will then need to navigate to the location where you store your app’s Dockerfile and build the Docker container. In our case, it is as follows:

cd src/genai/whatsapp_bot
heroku container:push web --app <your_app_name> --context-path ../../..

Finally, we need to release the built Docker container.

heroku container:release web --app <your_app_name>

You might also need to make sure that you have started your app.

heroku ps:scale web=1

You can now check that the app is working by running heroku open. Once the app is running, the final task is to configure Twilio’s WhatsApp Sandbox as described above, using your new Heroku URL instead of the ngrok one.

Now, your chatbot will work even when you have turned off your machine. Depending on the Heroku pricing plan you choose, Heroku can keep the app running all the time, or it can shut down the app to save resources during longer periods of inactivity and restart it when users start sending messages again.

Conclusion

Our WhatsApp chatbot prototype allows users to interact with a large language model on their phones. At the moment, the implementation is quite simple, with the chatbot largely serving as an interface to a couple of pre-made, easy-to-use prompts. Nonetheless, this has been useful to explore the technical feasibility of combining WhatsApp with generative AI. Moreover, we hope that by creating a tangible prototype, we can ignite further interest and explorations in this direction, especially in the context of early-years education.

This prototype has also highlighted differences in user experience between a text messaging app and a web app. The messaging app will be better suited for quicker interactions and shorter responses from the LLM — particularly because we can’t stream the LLM responses character-by-character to WhatsApp. This means that the prompts that we used for our web app prototype would need to be further optimised to provide shorter answers, for example, by sending one early-years activity idea at a time.

Potential future iterations could also explore using WhatsApp messaging history to enable a more natural style of interaction and allow users to ask follow-up questions about the explanations or activity ideas.

We’re also aware that there are other platforms for managing WhatsApp accounts besides Twilio (such as Wati and Messagebird) as well as many other ways to build an API and chatbots. If you’ve built something similar, we’d love to hear about it!