“Chasing J.A.R.V.I.S” A Series on the Design of Voice User Interfaces
Case Study One
May 2nd 2008 was a pivotal day for me.
It was the opening day for the first Marvel Universe movie, “Iron Man”, and Robert Downey Jr’s debut as the billionaire investor, engineer, designer, and overall inventor Tony Stark.
That movie at the time was science fiction. But it inspired the minds of many would be product designers and inventors, including me.
If it weren’t for Iron Man, I would argue that that the smart home wouldn’t exist the way it is now, the rapid advances in AR technology would be non existent without scenes of Robert Downey Jr manipulating floating 3D models, and the Voice User Interface wouldn’t have had it’s market fit if our imaginations where not sparked by Paul Bettany’s character, J.A.R.V.I.S.
J.A.R.V.I.S., Tony Stark’s personal virtual assistant, was capable of running Tony’s entire life and business.
He interfaced with all of Tony’s tech, and was wicked smart, understanding contextual cues and having a dry sense of humor to boot. He could inform Tony if his suit was suffering massive structural damage, if someone was at the door, and crack a quick verbal jab at Tony’s logic.
He wasn’t just a virtual assistant, he was a companion. A viable replacement to human contact. Essentially a really high standard for the voice interface designer.
Imagine a virtual companion that could contextually understand what your intent was when you talked to it?
Instead of a doorbell it was a natural sounding voice of your choosing saying, “Your mother is at the door!” followed by sir, ma’am, master, or any ego stoking name you want your virtual assistant to address you by.
Imagine instead of manually entering in reminders into an app, your assistant instead would prompt you of potential business opportunities you could capitalize on because it saw that you wrote down a vague goal in a to-do list app.
Reminders are a user input, a prompt is doing something without asking.
J.A.R.V.I.S. was full synthesis of relevant user prompts based on advanced contextual analysis that currently isn’t available in today’s contemporary virtual assistants.
Currently our voice technology is pretty primitive, but it doesn’t mean we can’t try to reach this standard.
It is only a matter of time.
Our first attempt
Along with two other students in my VUI design class, we received the following exercise to design an voice interface based on the following prompt:
The prompt was designed to not give us a problem to design for, we had to walk around the campus and speak with students about what their biggest pain point was. From there, we could design a potential solution using a VUI.
We talked to a few students about what they wish was better when it came to their college experience, and our original hunch was that students always had problems finding their classes, but quickly, we realized in our interviews that the biggest pain point will always be to some extent parking.
The pain point we discovered was how a commuter student could find the best parking spot for their contextual situation.
This brought up a bunch of questions in the team about what a potential voice experience could answer.
What building is your class in?
What parking lot is next to that building?
Is that lot full? What time is the class?
What time are you driving to class?
We defined the problem as the inability to predict the best parking solution for the users needs.
We decided to start playing with the idea that you could access you voice assistant from your car, and ask about the best parking options you have on the way to school.
In VUI Design, the best way to test out ideas was to verbally talk out potential scripts in a role play scenario. How this works is one person would play the Google Assistant while the other person would play the part of the user.
As the user, I started to talk to Katie while she was personifying the Google Assistant. When you talk out loud, you get a feel for how natural the spoken components of the conversation sound. If you just jump into writing a script, you might end up writing something that sounds awkward when spoken out load.
Once we came up with a solid use case MVP, we typed the script out in Google docs. You can see our notes on why we chose certain responses.
We strived to maintain adherence to the cooperative principle. We wanted an underlying sense of cooperation between system persona and user. That meant implying understanding with linguistic components such as implicit confirmations and informal wide focus questions to make conversation as smooth as possible.
Once we nailed down the script together, we used the storyline tool to quickly prototype a VUI to test on fellow students.
We tested the VUI on a couple students, and a couple key take aways where these:
- The dialogue was pretty natural and each interviewee was able to quickly interface with the prototype.
- “I’d only use this once on the first day of school, but once I had experience I wouldn’t need it anymore”
- “I don’t use voice assistants”
Because it is a relatively new technology, the ability to activate an Alexa Skill or a Google Action is kind of a hidden feature on those platforms. So mass market adoption of VUI’s still isn’t there yet to have people think of a voice assistant as smart enough to find them parking.
Also we a problem with the lifetime value of the VUI we designed. Even though it was “usable” it wasn’t desirable. To get directions to the closest parking spot is a “one and done” deal, once you know, why would you need Alexa, Google or Siri? What is the point of creating a one-hit-wonder VUI?
So going back to the drawing board, we recycled the process with new insights and role played potential use cases that could be used again and again depending on context.
Our solution was a little bit bigger, what if you could use the VUI to gauge where the best parking option was based of peak hours and give real time info on how the the parking was on campus.
In this script, Google Assistant suggest alternatives because it was peak hours when the user was commuting to campus.
When rapid prototyping a VUI, it is important to double down on a “happy path” use case first from my experience.
When you test out “happy path” first, you set up the most ideal situation for reality to poke holes in your ideas.
For us, we created a use case that wasn’t useful after one go. It was a throw away use case that wasn’t worth the investment of resources into developing it.
One coming up with a desirable solution to something as universally reviled as university parking, a better attempt is to create something that is contextually smart.
That’s why sci-fi like Jarvis helps us push the needle on VUI’s. It is much more useful to chose the harder path of developing a VUI that shares fresh and relevant information then to waste money on a fixed fact.
If you don’t strive to make a virtual assistant that seems smart and changes with every interaction through good tapering design, there is no reason you should make the VUI in the first place.
That’s why J.A.R.V.I.S. is the standard for aspiring voice designers.
Thanks for reading this far. I don’t claim that my answer is the best explanation of the current creative dilemma in this article so I am open to anyone adding to this theory. Let me know what you think!