Have you ever thought what would happen if you mixed Clippy from Word 97 and the Google Assistant? Probably not, it doesn’t sound like a great idea.
Well, as an experiment for school I wanted to see if I could incorporate a voice assistant into Adobe XD, and Clippy (backed by Google Assistant) chilling over in the corner of XD has been a mental image for about a month now.
Every time I saw Clippy come up in old school Word I looked for the close button as fast as I could. I never wanted help when he thought I could use it, and so I became frustrated with him trying multiple times to uninstall him. He was intrusive and not helpful to someone who already knew exactly what he wanted to do. I could tell Microsoft tried very hard to make him “cute” and fun, but instead it gave off a childish vibe which took away any chance of me taking him seriously. I think the only thing I used Clippy for was for procrastinating, if you clicked on him a couple times he would cycle through different animations.
So how was I going to create a system that didn’t make the same errors as poor Clippy? After interviewing users of XD and watching people use XD in their normal workflow I found three aspects to focus on to make the best assistant in XD. The assistant needed to be Context Aware, Non-Intrusive, and Conversational.
To be clear, I created this experiment for a Conversation Design class, so for a whole semester we had been brainstorming, scripting, testing, re-scripting,
re-testing, and prototyping. This was the workflow I was used to in Conversation Design, and so after brainstorming I began scripting.
A simple table in Google Docs served as my script.
As bare bones as I could make it for a first draft in order to get my ideas out quickly, but I ran into issues visualizing how the system was going to prompt the user both conversationally and visually.
Trying to get ideas, I opened up XD and ran through the tutorial I wanted to create with the thought:
How could I, as the system persona, guide a new user through a tutorial?
This thought process made scripting and eventual prototyping straightforward. I could write what I thought the system could say in a scenario, then match that prompt with an appropriate visual cue when needed. One example of this is when a user is looking for a tool or feature.
User: How do I give this shape rounded corners?
Assistant: That attribute is found here directly above the fill option
This audio prompt gives minimal clues about where the tool is, but by leveraging the screen in front of the user we can make something like this
A context aware system gives the system a look inside the thought process of the user
Now that the system can respond to context sensitive requests in a multi modal fashion I needed to make sure it wasn’t becoming a source of procrastination.
Similar to being context aware, it was difficult to design for a non-intrusive interaction without designing both the conversation and visuals at the same time. The most glaring example of this was the start of my tutorial script. The first draft looked like this:
Setting aside the fact that a new user wouldn’t know what to call this system persona, this interaction would be jarring for someone just opening XD. They would think
Is this going to happen every time I open XD? There is no way I can talk with my computer like this in my open office space.
And like that I would have lost a user.
After creating the visuals for the tutorial I realized it made no sense to use audio prompts at the beginning, but I still wanted to greet the user. The user needed to feel in control so I gave them two prompts that XD can be a conversational tool. This also set the expectation that this assistant isn’t going to suddenly start talking to the user without a prompt from him/her.
Most virtual assistants will reprompt a user when they don’t hear anything after asking a question or after hearing their wake word. This seemed too intrusive for a desktop program focused on getting things done.
The No Input Error normally lengthens conversations by multiple turns in the system’s attempt to help the user. In this case though if the user pulls up the assistant and asks for a command then says nothing, the system knows to go quiet. This is also an example of the assistant being context aware as it can tell whether the user is focusing on other parts of the program or itself.
Now the user feels in control of the situation and won’t be interrupted. I also wanted to make sure the commands to the virtual assistant weren’t robotic. Who doesn’t want to be Tony Stark talking to Jarvis in the most natural way possible?
A large aspect of this feature was making the interactions with the assistant more than one turn commands. Granted, commands like “Change this font to 12pt” will likely be 80% of the commands used by users I needed to make sure that other 20% was as perfect as possible (see Pareto Principle).
A danger in creating multi turn conversations was making sure I wasn’t forcing them. My main focus was still getting out of the user’s way so he/she could get work done. This made me focus on how a user could really use a virtual assistant within XD.
This is where I thought of the Tutorial use case, as a teaching opportunity will allow for many questions and responses. It also got me thinking about how an assistant could think two steps ahead of the designer.
If a designer requests a font weight, size and color change they will likely want to create a Character Style for their customized font. This can also open up features designers wouldn’t normally use because it can be seen as too much work to go through menus.
I also wanted to make sure users could talk naturally and the assistant could splice the applicable information.
These little adjustments make for a natural conversations with the assistant, and brings us that much closer to being Tony and Jarvis. Maybe one day we’ll be able to banter with XD and it will tell us we aren’t funny.
This was a very challenging, time consuming, and at points hopeless project and I enjoyed every minute of it. I learned how to integrate visuals with a conversation in a way I wouldn’t have if I stuck with a traditional Google Action or Alexa Skill.
I learned how to expand on ideas in a magnitude I have never done before. When I started this experiment I barely had three scripts, but now at the end I have more than 12 scripts with all the variations, error cases and primary use cases.
By creating every visual aspect in XD I finally got to learn that program on a very deep level, and I can see it being a heavy competitor to Sketch especially for the up and coming designers.
Finally, this project reinforced the idea that voice interfaces are the future of interfaces. As soon as conversation designers can handle every error case and users gain trust in speaking to a device, computing is going to look a whole lot different.
If you want to do a deep dive on my resources used in this project check out these links.