Touch interface guidelines dictate that the more simple and limited the gestural language used to control a system, the better. Use only these gestures when designing for touch devices: slide, pinch, zoom, tap, double tap. The reasoning goes that first of all, there are little to no physical affordances in most user interfaces for discovering new gestures, so users won’t find them. Second, even if they do discover them, this same lack of affordances makes them hard — impossible! some argue — to remember. But don’t we learn abstract systems all the time? As it turns out, humans are actually pretty good at that.
Microsoft has a prescribed gestural / touch design language that advocates this simple linguistic approach. We are encouraged to use this design language in our work for obvious reasons: habituation and consistency. We want similar gestures used across Microsoft products in the same way that in Spain, the government wants all Spaniards to speak the official dialect of Spanish and in most business settings, the participants speak English. Common languages make it easier to communicate and collaborate across cultures and geographies.
I work on the design team for Microsoft Office, as a product designer for Excel. My primary job is designing visual analytics capabilities, defining both visual and interaction design for new ways to visualize data across the Office suite. During my tenure, my colleagues and I have attempted to apply Microsoft’s gestural guidelines to interacting with data, but have found it limiting. For example, a common interaction model for multi-selection of data involves using both hands: a finger from one hand to tap and hold the first item, and then a finger from your other hand to select the subsequent items. However, our touch guidelines advise against timed gestures or the use of both hands in controlling a UI.
These challenges have made me question the whole notion that the gestural language we use when interacting with computers be limited to eight gestures (no timed gestures allowed).
A More Complex Language
In other work, Apple, in 2008, filed a patent for a “Multitouch Gesture Dictionary”, which advocates a much more complex language system, not unlike American sign language:
“Users of these multi-touch interfaces may make use of hand and finger gestures to interact with their computers in ways that a conventional mouse and keyboard cannot easily achieve. A multi-touch gesture can be as simple as using one or two fingers to trace out a particular trajectory or pattern, or as intricate as using all the fingers of both hands in a complex sequence of movements reminiscent of American Sign Language.”
Gestures that, like those found in American sign language, embody rich, semantic meaning:
“Each motion of hands and fingers, whether complex or not, conveys a specific meaning or action that is acted upon by the computer or electronic device at the behest of the user. The number of multi-touch gestures can be quite large because of the wide range of possible motions by fingers and hands. It is conceivable that an entirely new gesture language might evolve that would allow users to convey complex meaning and commands to computers and electronic devices by moving their hands and fingers in particular patterns.”
In the same ways that languages (not just verbal languages, but musical and mathematical ones, too) consist of modular building blocks to form communication systems, i.e., letters to words, words to sentences, sentences to paragraphs, adding prefixes and suffixes to change the meaning of a word — it makes sense to think about the languages we use to talk to computers in the same way.
To that end, designers should not be afraid to use a more robust language for touch. Here’s why:
Humans are wired for language. We have an innate capability for learning language.
Humans are able to learn complex languages without physical affordances, as long as there is community to learn from, and good feedback (visible, audible, tactile) to reinforce the communication between the sender and the receiver.
Research done by my colleagues at Microsoft Research show that when given the choice to manipulate data visualizations on touch interfaces, users prefer interacting directly with the data (chart or plot areas) as opposed to manipulating related UI controls, no matter how well mapped (ribbon or context menu). Additionally, we retain more information from things we directly manipulate by touch.
Designing with Complexity
So how might this look applied to interaction design? Let’s look at Steven Pinker’s four principles for language acquisition in his work on Learnability Theory. To learn a language, humans need:
A class of languages. One of them is the “target” language, to be attained by the learner, but the learner does not, of course, know which it is. In the case of children, the class of languages would consist of the existing and possible human languages; the target language is the one spoken in their community.
An environment. This is the information in the world that the learner has to go on in trying to acquire the language. In the case of children, it might include the sentences parents utter, the context in which they utter them, feedback to the child (verbal or nonverbal) in response to the child’s own speech, and so on. Parental utterances can be a random sample of the language, or they might have some special properties: they might be ordered in certain ways, sentences might be repeated or only uttered once, and so on.
A learning strategy. The learner, using information in the environment, tries out “hypotheses” about the target language. The learning strategy is the algorithm that creates the hypotheses and determines whether they are consistent with the input information from the environment. For children, it is the “grammar-forming” mechanism in their brains; their “language acquisition device.”
A success criterion. If we want to say that “learning” occurs, presumably it is because the learners’ hypotheses are not random, — but that by some time the hypotheses are related in some systematic — way to the target language. Learners may arrive at a hypothesis — identical to the target language after some fixed period of time; — they may arrive at an approximation to it; they may waiver among a — set of hypotheses one of which is correct.
Given these principles, how might designers create an environment that facilitates learning of a more robust gestural language for touch UI?
A class of languages.
Having a unified gestural language that applies across operating systems and devices, or domains (i.e., when interacting with a data visualization, this gesture always equals this action).
In this case, what kind of affordances or help UI can designers employ that make learning and remembering new gestures easier? How might we build user communities that teach new users shortcuts and tricks?
A learning strategy.
We see users do this already. Since the advent of the iPhone, people now expect glass to be interactive. We naturally swipe, tap, pinch and zoom our devices to see what happens, applying a trial and error strategy to find out what works.
A success criterion.
Designers must use clear and effective visual and audio feedback to confirm or deny that a user’s touch gesture has an intended effect on the system.
And lastly, my own addition, a governing body.
We need a neutral entity that sets forth definitions and standards for a common touch language, similar to language academies like L’Académie française, the Real Academia Española, or standards creators like the W3C. Both Apple and Microsoft have taken strides in this direction to define gestural languages for touch interaction on their individual platforms, but what we really need is a body of industry and academic professionals to lead the way in standardizing a gestural language across technology platforms so the touch interactions we have with computers can be easier to learn, more robust, and universally extensible.