Applying variation in AIs — Chatbots

Published in

TestSheepNZ

11 min readFeb 11, 2017

We’ve looked previously at how to directly test the neural network model that might be part of your AI solution. There we talked a lot about variation of input to compare with output.

This touches upon something I’ve talked about previously on my old TestSheepNZ Blogspot account. Fundamentally when we think of testing, we think of a test script, which always has two columns — one for actions and one for expectations.

Most scripts look something like this — it’s the essence of testing. You do an action, the system responds.

“Actions” are what we perform, and “expectations” are how we expect the system to respond to that action. Essentially a test is inputs vs outputs on a system.

Much of our thinking in testing is still clustered around functional testing — and thus when we think about varying inputs, we’re very good at thinking about modifying what we type into a text box.

That’s a good start for considering an AI chatbot, but what about when we use AI elements as the heart of systems designed around voice or facial recognition? Over the next few stories we’ll explore them all, and share some ideas about testing them. How you can think about variation in the controlled environment of your test lab, and how you can reach out for more!

But for today, we’re going to focus on the first in the list …

Chatbots!

Once chatbots were a bit of a curiosity to try and create an experience where talking to an AI would be difficult to differentiate between chatting online with a human being.

I used to belong to a forum called AlchemyX, which was run by a couple of friends of mine. They ran a chatbot named Silverstar as a member, who’d post stories that related to the general interest of the groups, and who you could interact with — it was based on an ALICE AI. Overall they felt like “another member of the forum”, although a bit random and off topic at times.

Today though chatbots are being seen in more practical terms. Some websites are using them as a front line to engage with customers. They are linked to a stored information on services to support answers.

Whereas previously you’d ask users to check a pretty comprehensive FAQ before contacting the helpdesk, your chatbot can interact with the customer and bring relevant information to them. This is where they earn their keep over being just a curiosity!

Increasingly over the last few years I’ve worked on more and more systems which allow customers to self-service some actions on their account traditionally only available to the admins on the helpdesk. One of the hugest ones being “asking for a password reset”.

If a customer can do something themselves, it keeps them off the phones. Each phone call to admin staff costs money, and just as importantly if you have a lot of such calls it means long waits. Anything which helps the customer and saves money has to be a bonus!

Stephanie in her talk ran a wonderful interactive session where we tested out the new Air New Zealand chatbot, and asked a few questions to. This chatbot is currently still in beta, and you can find it here.

Mike builds a chatbot!

All this interested me enough to have a go at creating my own, and put it through some testing! I found several helpful articles on “setting up a chatbot”, but this one was the most helpful.

In the end, I decided to go with Chatfuel, which can be hosted on a Facebook page. It’s not the greatest, but was one of the simplest, and fun to set up. And I did learn so much setting it up and experimenting with it (as you shall see).

The goal of my chatbot was to create a chatbot which could answer questions on testing — mainly give explanations/definitions, but most importantly lead people to articles or books that I’ve written. In essence the chatbot is a robotic brand ambassador for me!

Scripted conversations

It will help you in your understanding is if right now we stop for a moment and think of how call centres work. Call centre operators are trained using scripts — these are scenarios which make clear what they can and can’t help with. You’ll often find when you’re dealing with a call centre operator, you’ll phone on two different days, and they’ll say virtually the same thing.

Ah the things I found-out making friends with the call centre in Kiwibank! I’m always making diverse friendships and listening to their stories for later use. [For my money I loved to work on the call centre floor at Kiwibank most of all, so many characters, and such a fun atmosphere!] This slide deck gives you ideas of how what they look like.

Chatbot programs like Chatfuel work very similarly — you prepare scripts of conversations like this …

A single script from my The Test Sheep chatbot

On the left you define all kinds of “if the user says something similar to”, these are all the kinds of trigger statements which will cause a reply. On the right you define what that reply will be from the bot (which obviously is much more fixed).

What you do is build up a whole series of these responses. At which point you’re probably going “well, where’s the AI?”.

The “smarts” of the chatbot seems to come from how it doesn’t need to have an exact statement for “tell me about metrics” to trigger this response. When someone sends something which is a close enough match, it will trigger this response. The AI in the chatbot is it doesn’t need an exact sentence to trigger this — it can work out when you use a similar sentence, which is pretty big deal! Computers typically are very literal beasts, annoyingly so at times. [Just ask anyone who played a text based adventure game in the 80s]

Notice though, I’ve still had to figure out quite a few ways you might want to ask for metrics and test case counts to make a whole set of potential triggers. And you can bet people will find a different way of asking the same question that I’d not considered.

Welcome message and default message

In Chatfuel, you have two default messages you need to define upfront, and they’re pretty standard on any chatbot.

Firstly you have the welcome message. This is the message the chatbot uses when initiating conversation. Chatbots don’t really come with instructions, and certainly not a manual, so this is as close as you’ll get to being able to inform your user what they can use the bot for.

My chatbot welcome message — to be honest it took me a few goes to get this right.

Secondly, there is a default message. When the chatbot can’t work out what a user says, it will come up with this message. And believe me with your first few versions, it’ll come out a lot.

Your default message needs to make it clear that your bot did not understand, and ideally should offer some help, or room for suggestion. This is mine …

I’ve tried to make the default message a bit of fun, although it took a few goes. Originally it included a picture, and more words. But it was too much, and got tiresome when it came up repeatedly.

Mike tests it … with a little help from his friends

So, I loaded up about 20 interactions on testing to my chatbot, and played with it, repeatedly hitting all the right triggers.

Alas, because I’d written it, I couldn’t help but use phrases which were almost exactly the ones I’d used to define it!

So I published my chatbot to a page, and asked some of my friends to have a go. I’m able to monitor the chats — my friend Gabrielle found some testing topics I’d not covered almost straight away …

Unlike me, they had no idea of the structure of responses I’d put in, so they went through as best they could — just like the end user would. They often missed material, or kept getting a crappy default message all the time. I went through the transcripts and found some topics I needed to add interactions with, it also made me how repetitive the default message could get.

Most illuminating of all were some of my friends — friends who are not just testers, but damn good testers — when they connected to my chatbot, all they asked were questions on “what’s your name?”, “what do you eat?”, “what’s the weather?”. To my testing chatbot!

Most of the time they got the default message, but quite disturbingly, when Stephanie asked what my chatbot likes to eat, it put up a profile picture of me …

Create a positive, social interaction

This made me realise that a chatbot isn’t just a search engine for a Wiki. People want to interact with it in all kinds of ways. They want it to be fun, but to also be informative.

I realised that along with all the FAQ information that I was trying to script up, that I needed prepared responses for more questions around “social nicities” including,

Birthday / name / gender of the bot
Favourite TV / book / films
Use of swear words. I think humour is best.
Food / drink [Please don’t eat the author]
Weather
Politics

I also thought it’d create a positive user experience to have responses for suggestions, but also to provide something if they asked for help …

Ah … you might notice in that message a certain tone! This was the other thing, to make the interactions a bit more fun, I decided to give the bot a bit of a personality. You will remember we considered AI personality last time.

This isn’t a service bot for a customer, it’s my personal bot, and represents my personal brand. And if you ask my family, they’ll tell you I’m a bit cheeky, and as a co-worker once said “incredibly sarcastic at times”. Not really. Yes, that was me being sarcastic.

Now, I’m playing dangerously close with Mr Clippy territory here, but I decided the personality of my chatbot would be a completely subservient bot, who was a disturbed mix of passive aggression and sarcasm behind the scenes. This is basically an AI stuck on Facebook with delusions of being the next Skynet.

I also created multiple random answers for the same question, so that it wouldn’t feel too repetitive. For example …

Multiple responses for similar questions keeps it fun. This isn’t too important a question — I’d not want a sarcastic response for someone asking for help with Exploratory Testing for instance.

As an indication, I created a list of responses to testing topics, but I’ve created about two to three times the content to cover social interactions. People can’t help themselves from trying to explore this material. They know it’s a bot, but they’ll love it if the material is funny, and presents a consistent personality.

Harnessing the crowd

We’ve talked about the importance of harnessing variation for testing, for a chatbot it’s harnessing the power of the crowd — and you’ll see this being used in varying degrees throughout the AI testing series that follows.

Opening the chatbot up to other users who will not know the inner workings of the bot, who will ask questions differently to you, who will spell things differently. This is the variation of the human experience, we all do things differently.

Thankfully chatbots like Chatfuel include some different analysis tools so you can see what questions people are asking, which questions are falling between the cracks etc.

I’ve used this analysis tool to help find the weak spots in interactions, and add more responses to my bot to plug them.

You don’t have to cover everything

Your chatbot doesn’t have to cover everything in terms of social scenario — that’s what your default message is to cover.

But you need to be more focused on the subject matter expertise your chatbot is focused for …

Oops — session based testing is something which deserves better than a default response!

As you saw above, Gabrielle found a few places that deserved better responses than the default one. I expanded topics as I noticed people hitting them.

Use of humour

Humour can cause delight in a user. Part of a desire to ask more in order to find other hidden gems. But it has to be appropriate.

A humorous chatbot isn’t appropriate for instance for a “how to report a crime” website or a “check your symptoms for bird flu”. Although to be honest for anything as serious as that, I really don’t think a chatbot is an appropriate channel.

When your chatbot needs to tell your user to speak to someone else?

Remember how people could say anything to a chatbot? When is a situation serious enough that you need to tell someone to speak directly to another human?

Obviously if you can’t help a user to self-service their problem, it’s time to flash them the helpdesk number.

But there are also bigger and more human concerns to think about. Last year there was a considerable movement to get Apple to up their game on their search assistant Siri. When told “Siri, I’ve been raped”, Siri responded with the default “I’m sorry, I don’t know what you mean”.

Now cynics might be saying “if you’ve just been attacked or abused, why the hell are you on your phone?” But have a bit of empathy — people where something dramatic has occurred are in a state of shock, and it’s somewhat normal for “oh my god, I don’t know what to do”. Sara Wachter-Boettcher has a read on Medium which explores this, and it’s absolutely food for thought.

I know, why would someone come onto my testing blog and say something like that? But you know, maybe I should have a catch-all scenario just in case. And hope it never gets used …

Talk to The Test Sheep

Okay — I know you’re dying to have a go! You can message my chatbot from this page on Facebook — you have to message them, not just write on the wall! [No, you don’t have to “like” the page]

I’ve put a deliberate inconsistency in the chatbot, the first person to spot will get a copy of Software Testing As A Martial Art by David Greenlees bought for them! So get going …

Applying variation in AIs — Chatbots

Written by Mike Talks