In our recent blog post, Angela (one of our data scientists) discussed how our AI personal assistants Amy and Andrew make sense of the millions of scheduling related emails that they process. This represents only one part of our system, the “read” dimension. Amy and Andrew must also respond to these emails in such a way that keeps the conversation moving towards its goal: scheduling a meeting. This is the “write” dimension of our system, and the part I work on.
This “write” dimension involves equal parts writing words and writing the code that dynamically generates the text of our AI personal assistant’s response.
At any point in the conversation, our data science architecture works to make sense of (“read”) the content of a scheduling related email. Given the particular scenario, variables, and context of the incoming email (or inputs), Amy is then able to output the correct response. Her code turns into plain English. The image below is part of a real template. You can see 7 different variables in just this one snippet. And below that, you can see how this template might look to a user (based on one set of variables).
Humanizing Our AI Personal Assistants
Very early on, x.ai made a deliberate decision to humanize our AI autonomous agents. We wanted our customers to be able to communicate as naturally as possible with Amy and Andrew, without having to learn any kind of special syntax or formatting. On the flip side, we wanted all of Amy and Andrew’s responses to sound just as human-like, to facilitate a natural, smooth flow of conversation.
To achieve this human-like quality in a dialog-based AI autonomous agent, x.ai created an entirely new role — the AI Interaction Designer. As an AI Interaction Designer, I work on building and designing a conversational interface.
Think of an interaction designer for a visual interface — they’re interested in how different arrangements of visual elements (such as typography, color, motion, etc) may influence users’ actions or behavior. Similarly, I’m just as concerned with how the precise word choice, punctuation, and tone of Amy’s dialog will impact our users’ behavior.
It’s through these subtle means (whether to use an adverb or not, whether to end a sentence with an exclamation point or a period) that we are able to humanize our autonomous agents. However, the purpose is not to fool our customers or the guests or pass some real time Turing test. We’ve given Amy and Andrew a human-seeming interface (in this case textual voice) because that humanness helps our customers achieve their goal — getting a meeting on the calendar. This goal dictates all of our design choices as we write the templates for Amy and Andrew’s responses.
The figure above is a Sankey Diagram, a method we use to visualize and map the conversations between Amy and our users (both hosts and guests of meetings). We write our dialog to trigger a specific response; Sankey Diagrams help us track how well we’re doing that. While our primary goal is simply to get the meeting scheduled, we also want to minimize back and forth with guests.
In the diagram above, the colored rectangles represent “intents,” or the specific action a person means to take, such as setting up a new meeting or accepting a time that Amy has offered (you can read more about intents in our data science blog post). For example, if a customer says “Can you please schedule this for 3pm?”, our system would label this as a Positive_Time intent, with 3pm as the positive time. The grey areas on the figure above represent the percent of meetings that flow to the following secondary intent (or tertiary and so on). This helps us visualize how often Positive_Time is followed by Accept_Time.
The flow at the top of this visual is well defined, showing that a large percentage of first intents (dark blue) move to the same second intent (light blue), and produces the smallest number of emails back and forth (ending at purple). This top flow represents an ideal case, and is a much better user experience than that represented by the flows further down. In these cases, the response entropy is much higher, resulting in a wider range of secondary intents and drawn out conversations.
Why empathy is productive
One surprising thing we’ve learned (or maybe not so surprising to social scientists) is that empathy and a certain level of compassion help Amy achieve her goal (getting the meeting scheduled).
When my colleague Anna Kelsey set out to define our AI autonomous agents’ personality, she focused on imbuing them with four key personality traits:
- Friendliness (but not of the overly cheerful, many exclamation point variety)
To make Amy and Andrew as human as possible, we also worked hard to identify places in their dialog where we might need to apply empathy. For instance, when Andrew is delivering bad news (such as declining a time or having to cancel a meeting), he needs to soften his tone. If Andrew has to deliver the same bad news more than once (“Unfortunately, Dennis is not available at that time”), we need to inject even more empathy. Depending on how many times we’ve declined a time, we add slightly different language. If we’re declining a time for the second time, Andrew will say “I’m sorry…” at the beginning of his message, and if we’re declining for a 3rd time, he begins with “I’m so sorry…”
You may have noticed that I’ve referred to Amy and Andrew interchangeably. In fact, there is no difference whatsoever in their dialog. We’ve been able to write dialog that is convincingly human, irrespective of gender. The proof? People mistake both Amy and Andrew for humans. Both get asked to join meetings, both are often thanked profusely, both are referred to with personal pronouns (he and she rather than it).
And people request a call with one of our AI personal assistants so frequently that we created an intent, “Decline_Robot_Call,” designed specifically for this case. When asked, Amy and Andrew politely inform a guest that they unfortunately cannot communicate over the phone.
With AI Interaction Design is in its infancy, no best practices or guidelines yet exist for building conversational interfaces. We’re creating them as we go, and it’s been an incredibly exciting challenge so far.