Build a Chatbot That Cares — Part 1

In 150 Lines of Code

Josh Zheng
IBM watsonx Assistant
11 min readDec 5, 2016

--

You can find the code for this tutorial here.

Part 1 of this tutorial will walk you through the software of the chatbot. Part 2 will go through hardware assembly and Raspberrypi setup.

Welcome to Episode 2 of Teaching Robots How to Love. If you missed Episode 1, we taught a candy machine the difference between naughty and nice. Now thanks to IBM Research, we have an adorable robot to work with, his name is TJBot.

Aw, hello to you too TJBot! So before diving in, a big shoutout to IBM research, who laid out much of the ground work for making this robot possible.

Here is the official announcement and the official TJBot Github repository.

Introduction

For this tutorial, we’re going to power TJBot with APIs from Watson Developer Cloud. We’ll start by putting a voice interface onto TJBot, then give it the ability to converse and understand your emotional tones. In part 2 of the tutorial, we’ll transfer the code onto a Raspberry Pi and put the whole thing into the physical TJBot itself.

For the sake of simplicity, we’ll keep the conversation simple. Here are two typical conversations (via voice):

You: “Hello Watson”

TJBot: “Hey, how are you feeling today?”

You: “I’m so glad the Warriors won”

TJBot: “Yay, you’re happy, That makes me happy.”

or

You: “Hello Watson”

TJBot: “Hey, how are you feeling today?”

You: “I’m having nightmares.”

TJBot: “Don’t be scared. Life will get better soon. This too shall pass.”

Here a video of TJBot in action:

But you say, “Josh, it’s silly to share my precious feelings with a cardboard robot.” Well I beg to differ. First of all, I promise you that unlike most humans, TJBot won’t judge you no matter what you say. Furthermore, besides the fact that it’s therapeutic to express your feelings, interactions like these have a wide range of applications including customer support and patient outreach.

Now that you’re super stoked, let’s get started.

Step-by-Step Tutorial

Again, Part 1 is only focused on getting the code working on your laptop. I’m running the latest macOS Sierra on a Macbook Pro. This project requires quite a number of Watson APIs, 4 to be exact, so rather than throwing them all together at once, I’ve broken down the process by API before putting them together. So here’s an overview of what we’re doing:

  1. Create speech recognition feature using Watson Speech to Text and mic.
  2. Get Watson Tone Analyzer working on any text input.
  3. Create a simple conversation using Watson Conversation.
  4. Have TJBot speak text using Watson Text To Speech.
  5. Put the services together to complete the voice conversational interface.

Step 0 — What You Need

Node.js version 4.4.5+ (we use arrow functions)

Code: All the code you need can be found in this repository. Go ahead and clone the repo and I’ll walk you through how to set everything up in the following steps. This will do the trick:

git clone https://github.com/boxcarton/tjbot-raspberrypi-nodejs 

Tutorial on Instructables: Also made by IBM Research. I found this tutorial to be very helpful in the beginning. If you need more detail in some of these earlier steps, you might be able to find additional information there.

Credentials for:

If you’ve never used Watson APIs before, see my previous post on how to acquire the credentials. You need to get the credentials through Bluemix, but since the APIs are accessed over the HTTP RESTful interface, you’re not required to stay on Bluemix afterwards. Note: Bluemix is IBM’s PaaS offering that let’s you deploy and manage your cloud applications.

Also check here for more detailed steps on creating credentials on Bluemix.

The screenshot below shows where to find your credentials after you’ve created a Watson service. You’ll either create a New Credential or click the View Credential dropdown to get your username and password.

Step 0.5 — Configuration File

It’ll save you some time if you create all four Watson services at once. Once you have the services created and your credentials gathered in one place, it’s time to complete the configuration file.

If you’ve cloned the repository, you’ll see a config.new.js file. I’d recommend filling out the missing variables with your credentials now. But in case you’d like to just read through the tutorial, I’ve taken a screenshot for you so you know what I’m talking about. Don’t forget to rename this file to config.js before running node run.js.

Let’s understand how each service works individually before mashing them them together. To do this, I created a separate script for each service in the tutorial folder.

Step 0.75 — Install node_modules

Remember to run npm install in the root folder with package.json to install your node modules.

Step 1 — Speech To Text

Open the tutorial/step1_stt.js file. Notice that we’re using the Node.js SDK for Watson Developer Cloud to interact with the APIs. This will be true throughout the project. After the imports:

and the Watson Speech To Text instantiation:

You’ll find the code to set up the microphone:

These are the default configurations. You can look through the mic official documentation if you’d like to make any changes. Later on, I’ll show you that we would actually need to pause the microphone when TJBot is speaking to prevent infinite audio feedback.

Important Note: mic is a simple stream wrapper around arecord (Linux (including Raspbian)) and sox (Mac/Windows).

You need to have sox installed. The easiest way to do this is through homebrew. If you don’t have it, install it! Then simply do brew install sox.

Finally we get to Watson Speech To Text:

As you can see, besides setting the parameters, doing Watson Speech To Text is literally one line of code (line 31). You simply pipe the audio stream to your Speech To Text instance and it’ll transcribe your audio real-time via WebSocket and return the transcribed text to textStream.

Run node tutorial/step1_stt.js now and monitor the console to make sure everything works.

Step 2 — Tone Analyzer

Tone analyzer is one of Watson’s unique offerings. It’s based on the theory of psycholinguistics, a field of research focused on the relationship between linguistic behaviors and psychological theories. You can read more about the science behind the API here. The results include emotional tone, social tone, and language tone but we’ll only be using the emotional tone in this project.

The possible emotions returned include joy, sadness, fear, disgust, and anger, along with their respective likelihood score (between 0 and 1). For simplicity’s sake, TJBot will choose the emotion with the highest likelihood score, but better heuristics should be chosen in the future since a sentence can also contain a mix of emotions or the most likely emotion might not be the dominant one.

The code to call Tone Analyzer is again very simple:

Make sure you have this code working by running tutorial/step2_tone.js. Play around with the text variable to see what kind of results the service returns. We’ll later add on the code to pick out the emotion with the highest likelihood score.

Step 3 — Conversation

Now it’s time to create the dialog using Watson Conversation. If this is your first time using Watson Conversation service, I’d recommend briefly going over the official tutorial. You can think of Watson Conversation as an easy way to create a conversation state machine with NLP capabilities. If you’re interested in a more advanced discussion of the service, check out my previous post on using Watson Conversation to create a Slack Bot.

Let’s first wire up the dialog using the Watson Conversation Tool. You should see something similar to this when you login with your Bluemix account:

Click the Create button and follow the instructions on the modal window. You should end up with something similar to my TJBot workspace.

Now let’s get on with the dialog. Since this is an extremely simple interaction, we’ll not be using the Intents feature, but do read the documentation on intents if you plan to create more complex conversations. Again, my previous tutorial also dives deeper into many other aspects of this service.

This conversation will have one entity - Watson. Entities are nouns that you’d like the service to recognize throughout your conversations, so add that to your workspace.

Finally wire up the dialog like this:

Follow the documentation on dialog if you’re having trouble. Your dialog will start by recognizing Watson has an entity because the conversation always starts with a greeting involving the word “Watson”. Notice the Anything else node below that acts as a catch-all to user inputs that you can’t respond to. The “@watson” node then branches off into 5 nodes based on the detected emotion. I’ve uncollapsed the “fear” and “joy” response in the screenshot above.

Now if this is your first time using Conversation, you might be wondering what $emotion is. $<variable_name> is how you access your context variables. And what is Context? Well, first read the official documentation on it. But basically it’s a simple data structure (an object/dictionary) that gets passed between your code and the Conversation service. It’s the mechanism to which your code talks to the service and vice versa. After we determine the emotional tone of the user, we set context.emotion to one of the emotions (“happiness”, “sadness”, etc.) in our application code. The Conversation Toll can then access that variable via the “$” and branch the dialog accordingly.

If you don’t want to wire up the dialog yourself, I’ve exported my workspace to workspace.json so that you can upload it into your own workspace.

Either way, after getting your dialog to look like the screenshot above, it’s time to look at tutorial/step3_conversation.js. The following lines include the actual communication between your code and the service.

Notice how I’ve wrapped the code above in some additional code so the file can run as a standalone script. It also takes in user input via command line. We’ll get rid of this wrapper code when we later put everything together.

Run node tutorial/step3_conversation.js and play around with the context variable and try to access it via the Conversation Tool dialog.

A few things to watch out for:

  1. Make sure you have the correct workspace_id
  2. Make sure to persist the context variable (initialize it outside of the conversation scope) if you want to maintain the conversation state.
  3. Change context.emotion to the different emotions to make sure you’re getting the appropriate responses.

Step 4 — Text To Speech

Now we’d like TJBot to speak the response. Conveniently Watson Developer Cloud also provides a Text to Speech service.

Here’s the code to convert text into an audio file using the service.

You simply use the synthesize function and pipe the result to fs to create the file output.wav. We then use the play-sound module to play the file through the speakers.

Important Note: The play-sound module requires you having one of the players installed. I installed mplayer by simply doing ‘brew install mplayer’.

Step 5 — Putting It All Together

Finally it’s time get all the APIs to play together. This is done in run.js. At this point, you should be familiar with most of the code in this file. However, there are few changes I’d like to point out.

  • Only start the conversation upon hearing “Watson” (or whatever attention word you’ve set in your config file)

Since the microphone is on all the time, we don’t want TJBot to respond to every sound it hears. After converting user speech to text, we make sure the word “watson” is actually spoken.

Notice how on line 128, we only enter the conversation portion if start_dialog is set to true.

  • Prevent TJBot from talking to himself

Notice that we’ve put the text-to-speech code into the speakResponse function. More importantly, notice on line 108 that we pause the microphone right before we play the file. This is to prevent the audio from being heard by our speech-to-text function, resulting in a infinite feedback loop.

But how we know when to turn the microphone back on? We use the node-ffprobe module to get the duration of output.wav and save it to the variable pauseDuration.

Important Note: node-ffprobe requires ffmpeg to be installed (see official package homepage):

sudo apt-get install ffmpeg
npm install node-ffprobe

We also add on another 0.5 seconds to the duration just to be safe.

During the initialization of the microphone, we set it up such that whenever the microphone receives the ‘pauseComplete’ event, via the on(‘pauseComplete’) API (line 53), we pause the microphone for pauseDuration number of seconds.

  • Using Tone Analyzer requires promise

While talking to TJBot, we need to call Watson Tone Analyzer before sending the user input to the Conversation Service. Since every call in Node.js is asynchronous, we need to wrap the Tone Analyzer API call into a promise:

and resolve the promise using then() (see line 129) to call the Conversation API.

  • Resetting the dialog

Finally we need to reset the dialog after the last response from TJBot so that only the attention word (in this case “Watson”) triggers the conversation again. We added some code at the end to do just that by accessing the context.system.dialog_turn_counter variable:

We know that we’ve reached the end of the conversation when the turn counter has reached 2.

That’s it!

Run node run.js on your laptop and you should be able to talk to TJBot!

In Part 2 of this tutorial, I’ll be showing you how to:

  1. Put TJBot together from a laser-cut chip board
  2. Set up your Raspberry Pi environment
  3. Make a few modifications to the code so it runs properly on the Pi.

Special Shoutout To:

  • Mariya Yao: for telling me about TJBot in the first place and helping me with the laser cutting.
  • Maryam Ashoori: the creator of TJBot for the pointers along the way.

As always, if you have any questions, feel free to reach out at joshzheng@us.ibm.com, connect with me on LinkedIn, or follow me here on Medium.

--

--

Josh Zheng
IBM watsonx Assistant

Head of DevRel @ Great Expectations. Previously DevRe Lead at Shopify and IBM Watson. Hates writing.