Beyond Mental Models: Tackling Complexity in Interaction Part I

How we interact with computers is bewilderingly complicated.

A shallow examination into of our most basic digital behaviour reveals this utter complexity.

To open a document, we have to understand what globs of pixels mean that somehow indicate the structure of an invisible filing system. We need to understand how (double)clicking on a particular bundle of pixels labelled with particular text will move the present state of the system to a different part of the invisible filing system.

Yet using a computer is second nature to us and thus the cocktail of perception and cognitive processing involved is utterly invisible.

Watching an older person interact with computers (especially some time ago when computers were a newer phenomena) shines a spotlight on the complexity the digital native takes for granted. Elderly people will trepidatiously approach a computer, carefully examining each element. They misunderstand the conceptual metaphors on screen. They struggle to understand what is interactive. It’s only through usage that we becomes sufficiently integrated with that of “the digital” for us to use it seamlessly. Much like a rock climber able to navigate a seemingly impassable wall of rock by seeing hand and footholds where the rest of us would see jagged stone, we’re able to understand the meaning behind a wall of pixels, navigating our way through and across it.

How does this happen, this implicit understanding?

Norman’s description of the mental model

Is it a singular, unified cognitive process, rational and well disembodied that we we sculpt and adhere to whenever we engage in using a system? This is what Don Norman’s mental models suggests. Stating that we formulate our behaviour towards interactive systems by mentally modelling how that system works, Norman sees us as rational, disembodied actors. While useful for understanding whether the broad framework of a basic application is sensible, the model fails to account for numerous other factors:

  • we only mentally model what we perceive, and we may not perceive an entire system and thus be unable to model it, especially when it comes to webpages where we non-sequentially perceived sections attract our interest
  • the messaging involved, including sales messaging, may affect a user’s view of the system
  • we rarely have the time, inclination or mental state to rationally create a model of system
  • on most websites functionality isn’t the main purpose of a website for a user, it is our task at hand

Look at this website for the Porsche 911. What would be a user’s mental model of it? Would the user stand back and create rational mental model of each structure on the page’s elements before they scroll through? Or would they scan the page for information of interest, not taking the time form a clear, disembodied structure of what they are looking at?

As another example, take my usage of this very site. When I choose to name an article I click a button at the top of the page and a menu appears allowing me to write in the name of the post, its subtitle and description. Medium autosaves posts when you write. I expect the fields in this menu also to be autosaved when I click away.

What I see when click the edit post name button

That’s not the case. There’s a save button and every single time I edit the fields, I forget about the save button.

I don’t see the Save button in the menu because I don’t take the time to model how the menu works — I make assumptions, I think about my actions, not about the structure of the system that is being presented to me.

Human-computer interaction, like any type of human action, is to varying degrees not a fully cognitive experience. We act using tools, rather than thinking about the tools.

At the risk of overusing an example from Heidegger many people probably are already aware of, consider a hammer: you don’t think about the hammer when you use it, you just use it to do a task. You see that it affords hammering (i.e. the shape and structure of it allows for hammering), so you hammer. There is no higher level cognition there, it’s a mere sensory perception coupled with your desire to do something. You don’t need to think about the hammer when you use it, you are thinking about what you are trying to get done; in this way Heidegger calls the hammer ready-at-hand. Indeed, you only reflect on whether it is a hammer if, upon using it, you realise it’s a non-fully functional hammer. Heidegger called this being present-at-hand with the hammer.

Our tasks are collocated and necessarily part of the objects and our world. This is called having intentionality toward something. That is, we act towards something, our thoughts are engaged towards a particular object or activity. When you think about clicking “buy” on a computer screen, you aren’t thinking about the clicking of the button, you are thinking “I’m ordering this package”. Thinking about intentionality is important because it helps us consider our actions not as abstracted away from our goals.

(As a side note, it’s an important connection to something else I wrote about — the extended mind thesis: whether the structure you act through is outside of your brain or thoughts inside your brain is often irrelevant, the task at hand is more relevant.)

What I’ve been discussing is the concept of embodied interaction. It was formulated by Paul Dourish in the late 90’s. It has a strong philosophical foundation, engendered under philosophers such as Husserl, Heidegger, Gibson and Merleau-Ponty.

Husserl was one of the first to study the nature of our experience

Maintaining and managing what these and other philosophers describe as intentionality is a process in and of itself. We don’t just recognise a particular set of objects at hand and use them for our actions. We need to make them effective, to manage this chain of physicality.

To do this, we engage in what these philosophers have called coupling.

I’ll let Paul Dourish himself describe what coupling is:

“As I move a mouse, the mouse itself is the focus of my attention; sometimes I am directed instead toward the cursor that it controls on the screen; at other times, I am directed toward the button I want to push, the e-mail message I want to send, or the lunch engagement I am trying to make.”

So, coupling in interactive systems is not simply a matter of mapping a user’s immediate concerns onto the appropriate level of technical description. Coupling is a more complex phenomenon through which, first, users can select, from out of the variety of effective entities offered to them, the ones that are relevant to their immediate activity and, second, can put those together in order to effect action. Coupling allows us to revise and reconfigure our relationship toward the world in use, turning it into a set of tools to accomplish different tasks.”

Coupling then, is how we continually balance and make use of the physical world, our intentionality. We couple with series of objects to varying level of at-handedness to fulfil our needs.

As noted previously, how we interact with computers is extraordinarily complicated, so modelling coupling would be extraordinarily difficult. It’s nearly impossible to develop a structured calculus that incorporates every existing variable. You’d have to model the level of perception or cognition of each element of an interaction (mouse, graphics on screen etc), and determine whether each conceptualisation was more ready-at-hand or present-at hand —and that’s all just for a single step in any given task.

At a basic level, we can say that without a doubt we couple through an amalgam of ready-at-hand and present-at-hand conceptions. This framework activates in the triggering of our intents.

I’m going to suggest it is worth engaging in activities that involve examining whether our systems of coupling can align properly with interactive systems.

A basic example can illustrate how this is relevant in the most fundamental of tasks: reading a webpage requires you to understand the words, obviously, but it also requires you to be aligned with the structure of how the words are presented (the format, layout etc), how to see more words (e.g. scrolling) and what the system is trying to tell you (e.g, “you should read this article”).

More on that in Part II.