Being human: my day as a Loebner prize chatbot competition judge

Julian Harris
5 min readSep 10, 2018

--

Created in 1989, the Loebner prize is an annual socialbot competition offering prizes to human-likeness.

  • Gold medal is for those who convince the judges that it’s a human,
  • silver for a restricted variation on that, and
  • bronze for the best among the bunch that year.

Bronze medals are the only ones that have ever been awarded. Hugh Loebner, quite clearly a colourful character, created the competition for various reasons, the most charitable being that it was a way to track real progress towards “solving the problem of artificial intelligence”.

So I was very kindly offered to be a judge by Bertie Muller at AISB. I will share how it was set up and then some insights.

The structure:

  • 4 confederates: humans posing as bots
  • 4 socialbots (chatbots focusing exclusively on chit-chat rather than taskbots). The intention was that they’re anonymised (e.g. Mitsuku identified itself as Millie). These socialbots were not allowed to connect to the internet (see my thoughts on this later), and needed a specific integration with the Loebner infrastructure to present it in a new interface (see below).
  • 4 judges in a separate room. This year it was in Bletchley Park and visitors were invited to watch. We had a presenter switching between judge conversations over time.
  • Side-by-side chats: the judges had the human conversation and the bot conversation side-by-side, with a chat box each, so each judge would have two conversations at once. Judges would have 25 minutes to decide which one was a bot.
Taskbots are like normal software: be efficient and effective at completing a task. Socialbots are about empathy, trust and companionship. The Loebner Prize is a socialbot competition. See more at http://bit.ly/cx-chatbots
One piece of guidance I’d offer in future is for confederates — the human pseudobots — to get some basic coaching in how to pretend to be a stupid bot. The very first response immediately gave me confidence that this was a confederate not a bot.

I had a chat to some of the other judges and based on their profiles I became most comfortable with my forte: finding edge cases and generally messing about.

There are two interesting challenges being voiced from the community about this prize:

  • No internet: the fundamental issue with this is that the internet today is a utility, like power and water, and even setting aside the practical issues of getting cloud-scale software “offline”, the real question is: as long as the infrastructure is securely ratified, maybe this can be relaxed.
  • Being human: my read of the AI community strongly suggests that being transparent about being a bot is actually more ethically sound. Aside from this, it’s also clear that quite often bots are trusted more than people!

The most popular chatbot in the world, Microsoft China’s XiaoIce (“shiaOW-ice”), has more than 120m users, 25% of whom have said at one point or another, “I love you”. This is what success looks like today. And no one thinks XiaoIce is human, nor could XiaoIce likely be stored on a few computers offline. So there is mounting evidence that bots pretending to be human isn’t really a particularly useful objective, at least not for the next 10 years or so.

Great software design: understanding people’s “mental model”

When people interact with things in the world, they make decisions about what those things are. Their “mental model” is a key consideration when designing software: command line interfaces like MS-DOS or bash where you type seemingly random sequences to coax the computer to do your bidding involve quite complex mental models, models where the person adapts to how the computer works, rather than the other way around.

So what kind of mental model do people create when they talk to a chatbot? Studies have shown a chief complaint is inconsistency, that the best chatbots set expectations consistently so that you and I, users of these systems can start creating a picture in our mind of what this thing actually is and what we can reasonably expect it to do. Starting with “I’m a bot” is a pretty good start it turns out.

How did the chatbots respond?

If you don’t set expectations clearly, then people, faced with something sending them natural language, will start at a point of good will, that it is some form of human-like intelligence. This falls apart almost immediately however. Let’s look at how chatbots typically respond.

One session. What do you reckon: is green a person or a bot?

The following categories are my own fairly simplistic thoughts on how bots respond:

  • Conversation: handling memory of previous exchanges including parts of the same text. Good: remembers key facts (like my name, who “she” refers to etc). Bad: complete amnesia.
  • Response quality: 4 categories, including nonsensical, “generic but plausible”, specific, and over-specific.

“Generic but plausible” is where ELIZA (1966) almost solely worked: there were a bunch of keywords that triggered certain responses, but mostly it’d bounce back what you said. “I feel lonely” “Tell me more about feeling lonely”, etc. And all the socialbots we evaluated used this structure to a degree.

The problem becomes when bots try and be smarter than this. One bot responded to the question “tell me the next number in the sequence 1,2,4,8,”. It responded “16”. Super impressive. This bot knows maths! Wow. But. Take a guess how it responded to “next number in sequence 1, -1, 1, -1,”. Then think about where that leaves the mental model of you or I when it’s confused: how much trust do you have that it will answer the next question, regardless of depth? Does it really “know maths? It’s confusing it doesn’t know this even more basic sequence.

Being human is not an important goal for most.

This is a very hard problem, and it’s still an academic research problem to solve. However no one I know of is trying setting the objective to “pretend to be human”, rather, “pretend to be a very useful robot”: either they’re trying to design fast ways to solve problems, or fast ways to build trust and emotional connections, as bots.

I want to thank Bertie and the other AISB organisers for the opportunity to be a judge, and to the 4 socialbot providers, with a special hat-tip to Steve Worswick who has now been awarded the medal 4 times with his bot, Mitsuku.

If you want to learn more about the chatbot landscape and what business success looks like with them, check out the CognitionX Business of Natural Language Computing: a Primer on Chatbots and Voicebots” at http://bit.ly/cx-chatbots

--

--

Julian Harris

Ex-Google Technical Product guy specialising in generative AI (NLP, chatbots, audio, etc). Passionate about the climate crisis.