Building a chat bot with speech recognition: Naming, Logging, Monitoring & Repetition

Build your own TJ Bot
Me: Hi Buddy, it’s me
… *moves slightly closer* …
Me: Hi Buddy. It’s. Me.
Buddy: Hi Damian, what can I do for you?

Meet Buddy. Buddy is my incarnation of IBM’s TJ Bot. He’s an example of “embodied cognition” and is a cute physical interface to IBM’s Watson.

Disclaimer: This post is not a walk through of how to build TJ Bot (there are already some excellent Instructable steps for that which are clear and concise!). The aim of this post is to highlight some interesting problems I encountered along the way and the overall experience of building my first talking (and listening) robot!

The source code for this project is available here.

So, firstly, having followed the Instructable steps to get TJ Bot up and running, I wanted to give it a name.

Problem 1: Appropriate Naming

TJ Bot answers to “Watson” by default. However, if you’ve bought a £2 usb microphone and listened back to recorded audio on a Raspberry Pi (see useful link here) then you might forgive the Speech-to-Text service for translating it as “Whats on”, “What son”, “Washing” etc. At this point it’s definitely worth playing around with the gain settings & even placing TJ Bot away from anything that might hum or buzz (TV speakers for example).

I probably spent an hour or so experimenting with different attention words (names) including “TJ Bot” (sometimes translated as “PJ Bot”, “TeeJay Bot”) and “Buddy” (“Body”, “but they”, “but he”). Eventually I stuck with “Buddy” as this combination of plosives and vowels (along with unfortunate constants such as the tone/accent of my voice etc.) seemed to have the most success.

In an attempt to increase the recognition hit rate even more, I added a list of homophones to the TJBot config:

// set up TJBot's configuration
var tjConfig = {
verboseLogging: true,
robot: {
name: 'Buddy',
homophones: ['Buddy','Body','but they','but he', 'but the'],
gender: 'male'

Then added this list to the attention word check:

// listen for utterances with our
// attentionWord and send the
// result to the Conversation
// service
// check to see if they are
// talking to TJBot
function(v) {
return msg.toLowerCase().indexOf(v.toLowerCase()) >= 0;
})) {
//do something

More often than not, Buddy now responds to his own name!

Problem 2: Buddy? Did you get that? Buddy?!

Problem 2 was very quickly discovered during Problem 1. It’s all very well including logging such as console.log("Understood"); in the code, but why not make use of the fact your program can speak?

Getting Buddy to notify me that he had recognised his own name proved to be a much better user experience than squinting at the logs I was printing to the console.

Playing a simple tone to let me know I didn’t have to repeat myself was all it needed:

var understood = function() {
var create_audio = exec('aplay '+audioDir+'understood.wav',
function (error, stdout, stderr) {
if (error !== null) {
console.log('exec error: ' + error);

This also made me think further about ways of representing state change in a conversational manner. If possible, I wanted Buddy to audibly inform me of as many state changes as he could outside of the Watson Conversation flow:

  • Start listening: Speak
  • Heard attention word: Emit Tone
  • Didn’t hear attention word: Emit Tone
  • Exit due to internal error: Speak
  • Stop listening due to user request: Speak

Problem 3: User Interface

So by this point Buddy could say “Hello” and I’d added a Speaker entity to Watson Conversation so that he would respond with various comments if particular people introduced themselves.

However, I really wanted to avoid starting Buddy up manually, so I added the node startup command to /etc/rc.local so that it would run when the Pi boots up.

I also wanted others to be able to easily see if Buddy was listening, or to stop / restart Buddy, without having to touch any plug switches / know what connecting via ssh means.

I wrote a simple node express app which exposed the following endpoints:
GET /tjbot/status
POST /tjbot/pause 
POST /tjbot/resume
as well as serving up a simple HTML/Javascript dashboard to call these endpoints. This application effectively served as a wrapper for the bot module.

Using another device on the same network as the Pi, one could now monitor & control Buddy from http://raspberrypi:3000/tjbot

With these problems addressed, I wanted to teach my new robot to do something useful, but also something relatively simple. A great post by Simon Burns makes the point that a bot should at least “Do one thing well”.

I decided on telling the time (reading stringified Javascript Date objects, although getting TJ Bot to ‘find’ an analogue clock in a room and read the hands sounds like an interesting image recognition project!). This has since been extended to tell the time in a given city.

Feel free to take a look at the source and import the Watson Conversation workspace JSON available here.

Hopefully this post is of use to anyone building their own TJ Bot!