Create a Voice Chatbot for WhatsApp powered by ChatGPT

Run a business-specific WhatsApp Chatbot that can understand text, images and voice messages, and reply to users with text and voice messages. Powered by GPT-4o, Node.js, and Wassenger API.

Wassenger
11 min readNov 11, 2024

--

By following this tutorial, you can have a fully functional, voice-enabled ChatGPT-like AI chatbot running in minutes on your computer or cloud server. The chatbot is designed to operate as a virtual customer support assistant tailored for specific business purposes, communicating with users through text and voice messages on WhatsApp.

The chatbot can understand text, images and voice messages in many languages and reply with text and voice messages, trained as a customer support virtual assistant with predefined instructions and knowledge.

You can easily customize, extend and instruct the AI bot to adjust its behaviour, role, purpose, and knowledge boundaries to your business case. The AI bot is conversation-aware based on your previous WhatsApp messages history with the user, enabling more contextual and accurate responses.

Additionally, you can easily augment domain-specific knowledge about your business and customer in real-time by using function tools, allowing the AI bot to fetch external information from remote APIs, databases or files, and feed the AI with fine-grained, user-specific or context-specific information.

👉 👉 Code for this tutorial is available on GitHub 💻

👉 👉 Video tutorial available on Youtube

Contents

🤩 🤖 Wassenger is a complete WhatsApp Team Chat and API solution. Sign up for a 7-day free trial and get started in minutes!

Features

This tutorial provides a complete Voice-Enabled ChatGPT-powered AI chatbot implementation in Node.js that:

  • Provides a fully featured voice-enabled chatbot on your WhatsApp number connected to Wassenger.
  • Automatically processes any incoming user messages (text, image and voice) and replies with text and voice.
  • Understands spoken language in over 90 different human languages and replies accordingly based on specific pre-trained instructions.
  • Allows users to request to talk with a human, in which case the chat will be assigned to a team agent and exit the bot flow.
  • AI chatbot behaviour can be easily adjusted in the configuration file (see config.js).

How it Works

  • Starts a web service that automatically connects to the Wassenger API and your WhatsApp number.
  • Creates a tunnel using Ngrok to receive Webhook events on your computer (or you can use a dedicated Webhook URL if you run the bot program on the cloud).
  • Registers the webhook endpoint automatically to receive incoming messages.
  • Processes incoming text, image and voice messages from WhatsApp and replies with voice messages using a GPT-4o (multimodal) model pre-trained with custom business-specific instructions.
  • Automatically assigns chats to available team agents when the user requires it or the chatbot can’t help.

Bot Behavior

The AI bot will reply to inbound messages based on the following criteria:

  • The chat is not assigned to any agent inside Wassenger.
  • The chat does not have any of the blacklisted labels (see config.js).
  • The chat user’s number has not been blacklisted (see config.js).
  • The chat or contact has not been archived or blocked.
  • The chat is not a group or channel.
  • If a chat is unassigned from an agent, the bot will take over it again and automatically reply to new incoming messages.

Requirements

Project Structure

Browse the source files on GitHub

|– bot.js -> The bot source code in a single file
|– config.js -> Configuration file to customize credentials and bot behavior
|– actions.js -> Functions to perform actions through the Wassenger API
|– server.js -> Initializes the web server to process webhook events
|– main.js -> Initializes the bot server and creates the webhook tunnel (when applicable)
|– speech.js -> Handles voice message transcription and synthesis
|– package.json -> Node.js package manifest required to install dependencies
|– node_modules -> Where the project dependencies will be installed, managed by npm

Installation

If you have Git installed, run the following command from the Terminal:

git clone https://github.com/wassengerhq/whatsapp-chatgpt-bot.git

If you don’t have git, download the project source files here and unzip it.

Configuration

Open your favourite terminal and change the directory to the project folder where `package.json` is located:

cd whatsapp-chatgpt-bot/

From that folder, install dependencies by running:

npm install

With your preferred code editor, open the `config.js` file and follow the steps below:

Set Your Wassenger API Key

Enter your Wassenger API key.

You can sign up for free here to obtain your Wassenger API key:

// Required. Specify the Wassenger API key to be used
const apiKey = process.env.API_KEY || 'ENTER API KEY HERE',

Set Your OpenAI API Key

Enter your OpenAI API key.

Sign up on OpenAI for free and then obtain the API key here:

// Required. Specify the OpenAI API key to be used
const openaiKey = process.env.OPENAI_API_KEY || 'ENTER OPENAI API KEY HERE',

Set Your Ngrok Token (Optional)

If you need to run the program on your local computer, the program needs to create a tunnel using Ngrok to process webhook events for incoming WhatsApp messages.

Sign up on Ngrok for a free account and then obtain your auth token as explained here.

Set the token in `config.js`:

// Ngrok tunnel authentication token.
// Required if webhook URL is not provided.
const ngrokToken = process.env.NGROK_TOKEN || 'ENTER NGROK TOKEN HERE',

If you run the program on a cloud server that is publicly accessible from the Internet, you don’t need to use Ngrok. Instead, set your server URL in config.js > webhookUrl field.

Enable the audio input and output feature

In config.js, go the to `features` declaration in line 67th and set the audioInput and audioOutput to true as displayed below:


// Chatbot features. Edit as needed.
const features = {
// Enable or disable knowledge data loading for AI model training (pdfs, docs, csv, etc)
knowledge: true,
// Enable or disable text input processing
audioInput: true,
// Enable or disable audio voice responses.
// By default the bot will only reply with an audio messages if the user sends an audio message first.
audioOutput: true,
// Reply only using audio voice messages instead of text.
// Requires "features.audioOutput" to be true.
audioOnly: false,
// Audio voice to use for the bot responses. Requires "features.audioOutput" to be true.
// Options: 'alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer'
// More info: https://platform.openai.com/docs/guides/text-to-speech
voice: 'echo',
// Audio voice speed from 0.25 to 2. Requires "features.audioOutput" to be true.
voiceSpeed: 1,
// Enable or disable image input processing
// Note: image processing can significnantly increase the AI token processing costs compared to text
imageInput: true
}

Enable Audio-only reply mode

If you want the chatbot to exclusively reply using audio messages, set the audioOnly to true :

// Chatbot features. Edit as needed.
const features = {
// Enable or disable knowledge data loading for AI model training (pdfs, docs, csv, etc)
knowledge: true,
// Enable or disable text input processing
audioInput: true,
// Enable or disable audio voice responses.
// By default the bot will only reply with an audio messages if the user sends an audio message first.
audioOutput: true,
// Reply only using audio voice messages instead of text.
// Requires "features.audioOutput" to be true.
audioOnly: true,
// Audio voice to use for the bot responses. Requires "features.audioOutput" to be true.
// Options: 'alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer'
// More info: https://platform.openai.com/docs/guides/text-to-speech
voice: 'echo',
// Audio voice speed from 0.25 to 2. Requires "features.audioOutput" to be true.
voiceSpeed: 1,
// Enable or disable image input processing
// Note: image processing can significnantly increase the AI token processing costs compared to text
imageInput: true
}

Customization

You can customize the chatbot’s behaviour by defining a set of instructions in natural language that the AI will follow.

Read the comments in the code for further instructions.

Also, you can easily adjust the code to match more specific requirements. The possibilities are nearly endless!

To proceed with the customization, open `config.js` with your preferred code editor and set the bot instructions, welcome, and default messages based on your preferences.

Call External Data and APIs (RAG)

With function calls, you can easily feed the AI model with contextual, real-time, and user-specific information to generate better and more accurate responses using Retrieval-Augmented Generation (RAG) techniques. Behind the scenes, it uses the OpenAI Tool Function Calling feature.

When the AI model requires certain information to generate a response, it will instruct to run one or multiple functions to retrieve additional information. For instance, in a function, you can query an external CRM API or database to retrieve customer-specific information with whom the AI agent is having a chat, such as email address, username, shipping address, etc., then provide that information as text or JSON to the AI model for accurate user-specific context-aware response generation.

Using tool functions is very flexible and enables you to build complex and domain-specific use cases for an AI agent bot.

See functions.js file for multiple examples of how to use and define tool functions for remote AI data loading (RAG):

// Tool functions to be consumed by the AI when needed.
// Edit as needed to cover your business use cases.
// Using it you can instruct the AI to inform you to execute arbitrary functions
// in your code based in order to augment information for a specific user query.
// For example, you can call an external CRM in order to retrieve, save or validate
// specific information about the customer, such as email, phone number, user ID, etc.
// Learn more here: https://platform.openai.com/docs/guides/function-calling
const functions = [
// Sample function to retrieve plan prices of the product
// Edit as needed to cover your business use cases
{
name: 'getPlanPrices',
description: 'Get available plans and prices information available in Wassenger',
parameters: { type: 'object', properties: {} },
// Function implementation that will be executed when the AI requires to call this function
// The function must return a string with the information to be sent back to the AI for the response generation
// You can also return a JSON or a prompt message instructing the AI how to respond to a user
// Functions may be synchronous or asynchronous.
//
// The bot will inject the following parameters:
// - parameters: function parameters provided by the AI when the function has parameters defined
// - response: AI generated response object, useful to evaluate the AI response and take actions
// - data: webhook event context, useful to access the last user message, chat and contact information
// - device: WhatsApp number device information provided the by Wassenger API
// - messages: an list of previous messages in the same user chat
run: async ({ parameters, response, data, device, messages }) => {
// console.log('=> data:', response)
// console.log('=> response:', response)
const reply = [
'*Send & Receive messages + API + Webhooks + Team Chat + Campaigns + CRM + Analytics*',
'',
'- Platform Professional: 30,000 messages + unlimited inbound messages + 10 campaigns / month',
'- Platform Business: 60,000 messages + unlimited inbound messages + 20 campaigns / month',
'- Platform Enterprise: unlimited messages + 30 campaigns',
'',
'Each plan is limited to one WhatsApp number. You can purchase multiple plans if you have multiple numbers.',
'',
'*Find more information about the different plan prices and features here:*',
'https://wassenger.com/#pricing'
].join('\n')
return reply
}
},

// Sample function to load user information from a CRM
{
name: 'loadUserInformation',
description: 'Find user name and email from the CRM',
parameters: {
type: 'object',
properties: {}
},
run: async ({ parameters, response, data, device, messages }) => {
// You may call an remote API and run a database query
const reply = 'I am sorry, I am not able to access the CRM at the moment. Please try again later.'
return reply
}
}
]

Run the bot

Run the bot program:

node main

Run the bot program on a custom port:

PORT=80 node main

Run the bot program for a specific Wassenger-connected device:

DEVICE=WHATSAPP_DEVICE_ID node main

Run the bot program in production mode:

NODE_ENV=production node main

Run the bot with an existing webhook server without the Ngrok tunnel:

WEBHOOK_URL=https://bot.company.com:8080/webhook node main

Note: `https://bot.company.com:8080` must point to the bot program itself running on your server and must be network reachable using HTTPS for a secure connection.

Questions

Can I train the AI to behave in a customized way?

Yes! You can provide customized instructions to the AI to determine the bot’s behaviour, identity, and more.

To set your customized instructions, enter the text in config.js > botInstructions.

Can I instruct the AI not to reply about unrelated topics?

Yes! By defining a set of clear and explicit instructions, you can teach the AI to stick to the role and politely not answer topics that are unrelated to the relevant subject.

For instance, you can add the following in your instructions:

You are a smart virtual customer support assistant who works for Wassenger.
Be polite, gentle, helpful, and empathetic.
Politely reject any queries that are not related to your customer support role or Wassenger itself.
Strictly stick to your role as a customer support virtual assistant for Wassenger.Copiar código

Can I customize the chatbot response and behaviour?

For sure! The code is available for free, and you can adapt it as much as you need.

You just need to have some JavaScript/Node.js knowledge, and you can always ask ChatGPT to help you write the code you need.

How to stop the bot from replying to certain chats?

You should simply assign the specific chats to any agent on the Wassenger web chat or using the API.

Alternatively, you can set blacklisted labels in the config.js > skipChatWithLabels field, then add one of these labels to the specific chat you want to be ignored by the bot. You can assign labels to chats using the Wassenger web chat or using the API.

Do I have to use Ngrok?

No, you don’t. Ngrok is only used for development/testing purposes when running the program from your local computer. If you run the program on a cloud server, most likely you won’t need Ngrok if your server can be reachable via the Internet using a public domain (e.g., bot.company.com) or a public IP.

In that case, you need to provide your server’s full URL ending with /webhook like this when running the bot program:

WEBHOOK_URL=https://bot.company.com:8080/webhook node main

Note: https://bot.company.com:8080 must point to the bot program itself running on your server and must be network reachable using HTTPS for a secure connection.

What happens if the program fails?

Please check the error in the terminal and make sure you are running the program with enough permissions to start it on port 8080 on localhost.

How to avoid specific chats being replied to by the bot?

By default, the bot will ignore messages sent in group chats, blocked, and archived chats/contacts.

Besides that, you can blacklist or whitelist specific phone numbers and chats with labels to be handled by the bot.

See numbersBlacklist, numbersWhitelist, and skipChatWithLabels options in config.js for more information.

Can I run this bot on my server?

Absolutely! Deploy or transfer the program source code to your server and run the start command from there.

The requirements are the same, no matter where you run the bot.

Also, remember to define the WEBHOOK_URL environment variable with your server's Internet-accessible public URL as explained before.

Affiliate Program

🤑 💰 Earn 30% commission by referring users to Wassenger. Get your referral link now. Terms apply.

--

--

Wassenger
Wassenger

Written by Wassenger

WhatsApp API + Team Chat + CRM + AI Assistant Solution for smart Businesses and Teams. Automate and work more productively with your clients on WhatsApp!

No responses yet