Making Tina Talk

What our first chatbot taught us about language, machine learning, and listening to our audience.

Published in

rehab

8 min readNov 7, 2016

Tina the T. Rex is our first chatbot. Although she currently lives on the Facebook pages of National Geographic Kids in the UK, Australia, New Zealand, and South Africa, and at talktothetrex.com, Tina started life in January 2016 as an internal hack week project. Our hack weeks are ways for us to explore emerging technology and behavioural insights; in the case of Tina, we wanted to investigate natural language software, and the increasing potential of messaging platforms. So, in a week, we made a ‘talking’ T. Rex which people could write questions to, in plain sentences, and get a response from a limited scripted range. We called her Tina for the alliteration, and in tribute to our then-intern (Tina).

After Nat Geo Kids learned about Tina they thought she would be an interesting way to engage their Facebook audience. So we revisited the project in a more methodical way, bringing all we’d learned from the hack week into a formal process. This article is a summary of what we learned (tl;dr: a lot) about natural language and scripting chatbots when making Tina.

Writing

Very early in the creation process we concluded that Tina should really be considered a Q&A bot, not a chatbot. She can answer questions, but won’t keep a human-like conversation going. This is largely a limitation of the time and software that was available to us; it turns out that it’s surprisingly hard to retain the context that a natural conversation requires.

During our hack week we created Tina’s responses using facts from the internet and some funny(?) responses we scripted ourselves. But to become a product, it was clear she would need a consistent tone of voice. Our copywriter, Sophie, wrote a short document setting out the rules for writing Tina’s answers, including:

Tina is a T.Rex that is going to be speaking to children. The words chosen have to be those that a young audience will understand. Tina is dead (sorry Tina), but she is aware of this. She should speak about her life in the past tense.
She should deliver scientifically accurate responses — she is a learning tool, not a game. That said, she can be fun: her nature should reflect the joy of exploration.

One of the rules dealt specifically with Tina’s domain — that is, the range of her understanding of the world:

Tina has one topic of conversation — herself. She doesn’t discuss other dinosaurs, unless she is thinking about eating them. She doesn’t discuss current events, cavemen or space. She will talk about only one movie — Jurassic Park.

Being strict about this rule is important. It’s better to narrow the domain and clearly set user expectations than try to answer everything and fail. That said, it shouldn’t be completely inflexible; if we saw lots of users asking about cavemen, for example, we should consider writing a response to address that. Following users’ desires is more important than being dogmatic.

Training

Writing a bot is quite different from writing an application. It requires ongoing training, testing and tweaking, so we developed a process of iterative writing and feedback that gave us the best results.

Also, natural language systems require a lot of questions in order to train the systems to understand what people are asking. We started by writing these ourselves, but eventually outsourced the job to workers on Amazon’s Mechanical Turk, asking them to give us the top five questions they would like to ask a T. Rex. We (manually) sorted their answers by frequency, and came up with about 30 common questions that formed the core of Tina’s knowledge.

Natural language services

The software we used to create Tina’s smarts is based on entity extraction — essentially, you look for keywords in a sentence, and use those to judge the intent of the sentence. So given the question, “how fast could you run?” we might say that the two important words are fast and run, and that they show that the writer’s intent is to know Tina’s maximum speed.

This keyword approach mostly works well, but does raise some problems. For example, a question could be “when did you live?”. From that we could say that “live” is the key word, and that the writer’s intent is to know the era Tina lived in. But another question, “where did you live?” could match “live” to Tina’s geography.

Our solution to this lay in copy rather than technology. We scripted responses that could, as naturally as possible, answer more than one question; for example, the response to when Tina lived:

When did you live?
I lived in North America, 65 million years ago

… could also naturally answer the question about where she lived:

Where did you live?
I lived in North America, 65 million years ago

Scripting in this way leads to higher perceived success, as more of the users’ questions are answered. As an added effect, the answers also sculpt the possibility space; given more information than they asked for, writers are more likely to ask a different follow-up question.

The machine learning software that we used aims to be helpful by learning new ways of asking the same question, but it would occasionally yield strange results. For example, it started to associate any questions about time, with responses about hunting:

What are you doing on Tuesday?
It’s possible we hunted in packs

And there’s no way to tell it not to do this. If this becomes a real problem, all you can do is write a new response specifically about time, and retrain all the questions to use that response.

Perhaps the strangest of all, though, was this exchange:

When did you live on Earth?
What is the extinct?

We never wrote that answer! To this day, we don’t know where it came from.

Keeping the conversation moving

The moment your bot introduces itself is the first and most important opportunity to set user expectations. For example, the first version of Tina, would invite users to “ask me anything”, and… well, they did.

I’m Tina the T. Rex. Ask me anything!
Who would win a fight between a badger and a gibbon?
Sorry, I don’t understand your question.

This stops the conversation dead. Tina has failed at her first attempt.

So we rewrote the introduction with three key changes: setting expectations by narrowing the domain to “questions about my life”; introducing some scope for errors by announcing the fact that it’s a work in progress; and giving an example of the type of question that could be asked.

I’m Tina the T. Rex. I can answer questions about my life, although I’m still learning. Why not ask me a question about my appearance?
What did you look like?

Immediately we saw conversation flow more naturally.

Other things we noticed about the chat sessions we witnessed during testing were that people were very keen to take any opportunity to keep the conversation going. For example, an early error response commonly left the conversation going round in circles:

Sorry, that’s not my area of expertise.
So what is?
Sorry, that’s not my area of expertise.

Also, people would use exclamatory or punctuating language, which lead to dead-ends:

I lived around 65 million years ago.
Cool!
Sorry, I don’t understand your question.

Our way to work around this was to add a catch-all response for statements like these:

So what else would you like to ask about me?

We also quickly learned not to use rhetorical questions in our responses:

Not bad, huh?
Yeah, not bad.
Sorry, I don’t understand your question.

And, learning from the success of our introductory text rewrite, we came up with more detailed responses for questions we didn’t recognise; repeating that the bot is in ongoing training, and providing an example question:

Were you angry?
I’m not sure. I’m still learning, and every question you ask helps me. Why not ask how old I am?

We wrote multiple responses that are displayed randomly to provide some variety to the user, but all follow the same principles of explaining and guiding.

The real audience

After training the bot with Mechanical Turk questions, we needed to get Tina into the hands of our real audience: children. We asked all the parents in the studio to test Tina with their children, and give us the results. We were pleasantly surprised with the outcome, as many of the common questions were successfully answered. But there were two question subjects we saw repeated frequently: Jurassic Park, and poo.

It turns out, kids care quite a lot about how dinosaurs poo. A guiding principle of making Tina was to answer the questions that users asked, as long as they didn’t contradict the tone of voice document. So in went an answer about poo.

In the wild

We were now ready for launch. Tina went live in July 2016, and as we shared it out on our social channels we started to see conversation sessions begin. Many of the initial users were curious adults, and one thing we learned quickly was that they were keen to test Tina on current affairs:

So, where do you stand on Brexit?
Sorry, I don’t understand your question.

Being unaware of current affairs was consistent with Tina’s tone of voice statement, and important in keeping her domain narrow. Answering even one topical query would have set a level of expectation that we could answer more, so we were happy to leave these unanswered.

It became evident fairly quickly that people are very polite! Many conversations started with “hello” and ended with “goodbye”, so we made sure to write answers for those. And we saw bad language less frequently than we’d feared, although we had a response for that ready:

You !#^%.
Let’s keep things clean.

People really engaged with Tina, keeping conversations going even when they were very one-sided:

You’re pretty bad at this.
Let’s keep things clean.
Say that one more time and we’re done here. I’m LEAVING.
What else can I tell you?
Were you dropped as a child?

But the internet being what it is, we also saw an attempt by the notorious 4chan to get Tina to repeat the mistakes of Tay. However, we’d decided early on that we wouldn’t use external data sources, and we’d never repeat anything back to the user, so that attempt came to nothing.

Results

Tina was supposed to live on Nat Geo Kids’ Facebook page for a few weeks, but has been a great success and is still live there to date, with more than 15,000 chat sessions under her belt. We’ve had wonderful feedback from the press, including Fast Company’s statement that Tina is “the first truly awesome chatbot”.

But perhaps even more valuable to us has been learning a lot about the way to build — or the way NOT to build — chatbots in the future. I hope this can be just as useful to you too.

Written by Peter Gasston

Peter is our Senior Creative Technologist at +rehabstudio. He has 15 years experience working on the web, is the author of two books, supports Arsenal and likes going on day trips to castles.

Written by Senior Creative Technologist Peter Gasston.