Designing for non-visual interfaces — Defining a Design Process [Pt.1]
As previously mentioned in parts 1 & 2, sonic interaction is a complex beast — it is therefor necessary that we spend some time developing a design process that supports its many dimensions. How might we go about designing, iterating on and talking about this kind of experience? Furthermore, how might we incorporate key brand-specific characteristics in a product experience that is purely sonic?
Where To Begin?
Defining exactly what kind of product you’re designing and for who it’s being designed is the first (and most important) port of call for any project. Becoming as informed as possible re; a client’s market, relevant technologies and competitor products is vital, especially when it comes to designing something you’re less familiar with.
For the sake of conversation (pun intended), let’s imagine we are designing a Voice Assistant for the home—a product category that has become increasingly popular in the last few years. At the outset, users have numerous practical and emotional expectations of this product category, but how should one go about planning for and successfully satisfying these demands?
One approach, utilised by BBC R&D in their VUI explorations, makes things a little more tangible. It involves dissecting the experience into a number of component parts (user, context, emotional state, programme/task, device and tone of voice) to help us understand and explore the most relevant use cases and functional requirements of any given voice-based project.
The great thing about this approach is that it provides a vast nexus of inputs with which to work. Gradually, by erasing irrelevant user groups or tasks, we’re left with a more manageable set of variables that will inform the way in which we build our experience and anticipate certain interaction or technology specific hurdles.
Diagrammatic Representation & Feature Definition – A Working Process
Designers, developers, ‘product people’… we absolutely love a diagram. Representing complex problems in visual ways allows us to communicate more effectively and, put simply, solve problems quicker. Wireframes, flow diagrams and click dummies help us explore granularity and co-dependent interactions or design decisions, just like an electrician’s wiring diagram or an architect’s blueprint. Unfortunately, when it comes to voice-based interaction, being that we’re dealing with something non-visual and with a very limited sense of hierarchy, these methods of representation feel a little redundant (99% of of the VUI experience exists out with the boundaries of our screens).
It seems more appropriate then, to first tease out some high level requirements before focussing on the specifics of a user’s journey. Taking the BBC’s ‘component’ approach as a starting point, we can begin to join the dots between seemingly disparate variables and slowly build an understanding of potential features and user journeys.
Tracing the yellow blocks from left to right, we can extrapolate certain needs, or indeed fully formed features based on this (or any other) combination of inputs – in this case;
Use case; an 18–30 year old person at home using their mobile phone to set an alarm before bed.
Product Feature; a voice controlled alarm clock.
The objective here is to land on a set of relevant use cases (shown left); the journies or objectives that our theoretical home-assistant will satisfy. From here, with reference to the product / tech / market research I mentioned earlier we can hop, skip and jump our way towards something resembling a fully-formed product concept. How exciting!
Though this method of subtraction or filtration helps us identiy the key cornerstones of our product experience, it also helps us identify what our product will not do (which is of equal importance!). In competitive markets, there’s a tendency to overcomplicate experiences by trying to satisfy every possible user group or edge case, or by trying to shoehorn in features that exist in competitor products. The result is, unsurprisingly, very rarely successful.
From a Product and Brand perspective, your audience must understand precisely what problem the product solves. Discombobulating them with an endless list of features, or sounding a bit vague, (‘This product will make your dreams come true!’) is perhaps not the way to go, especially when dealing with a method of interaction that, to the user, already feels like a leap of faith. Your VUI experience (or product experience in general) needs not to be a Jack of All Trades, but rather a Master of One.
Anyway, now we have a route to understanding who we’re designing for and what our product will do… perhaps it’s worth considering how we might get from a high level product concept to a working prototype. What kinds of working tools might we require for a fruitful iterative design process and when interfacing with Stakeholders, how might we best communicate?
Prototyping VUI experiences – Going Off Grid
For classic UX designers, making the transition from screen-based interaction to voice can feel a little odd. Luckily, a lot of what we already know and do (designing for your user, quantifying success & failure, testing & iterating on a concept) is relevant. Besides, if Madonna has taught us anything it’s that re-invention (and perhaps a scalpel) is the elixir of life – UX design is now entering it’s Ray Of Light phase, Kabbalah n’ all — just embrace it!
Mapping a user journey from A to B using language seems, at first glance, a relatively straightforward process. User says A, product does A, right? Well, no, not really. A good voice-based experience must also accommodate a myriad of edge cases; slang, poor sentence structure, mumbling, human error, swearing perhaps? – essentially, our job as designers is to provide systems of logic that accommodate these things rather than tell the user ‘Sorry, I didn’t quite get that’.
This is, in my head, where the question of tooling comes into play. If we are to anticipate and accommodate all manor of inputs, what kinds of tools might help us along the way? Taking inspiration from other pieces of software, or industries in which speech and language are also relevant, what might a VUI design and development tool look like and what do we need it to do?
Thanks to platforms like wit.ai, language to describe the myriad of inputs and outputs does exist. Commonly, conversational experiences are understood and built using what we like to call Entities, Intents and Responses.
Entities are the words you teach your bot to understand (what is ‘Coffee’?), the associated variants of that word or thing (‘Espresso’, ‘Flat White’) plus relevant slang words or alternative spellings of that variant (‘Flatty’ or ‘Flt White’).
Intents are the tasks your user might like to complete (ordering a Coffee) plus the many ways your user could initiate that task (‘gimme an espresso’, ‘flat white please’).
Responses are the multiple ways in which your bot will respond and are attributed to specific Intents (‘Sure thing!’, ‘Would you like sugar?’, ‘Soy, Almond, Oat or Cow?).
At times, defining these logics can feel a little abstract … and not necessarily all that exciting — but in order to build a compelling conversational experience, you’ve got to go granular (and perhaps drink a lot of a coffee).
One tool that makes life a little easier when prototyping conversational experiences is Dialogflow. Not only can you build a very complex set of logics, it also offers in-browser text and voice preview and a number of direct integrations (Facebook Chat, Slack, Alexa, Cortana) and SDK export options (Android/iOS, HTML5, Ruby, JS). Essentially, it lets you build high fidelity prototypes without actually having to know anything about code — ideal!
Most importantly, what tools like Dialogflow provide is real-time, iterative prototyping. Voice interaction can seem abstract at the best of times, so the quicker you and your client can arrive at something that’s test-able, the better.
Having built the basics, next comes the task of making your conversational experience feel brand specific. How does your bot deliver information? How does it address it’s user? Is it ever completely silent? Tempering these variables so they reflect your client’s Brand or Product Vision is important and likely the best route to market differentiation.
All to be explored in part 4 …