Implicit observation and study of possible patterns: different ways people think and how behavior might be affected by the design of digital social spaces.
What’s an interesting way to observe, and record my observations , with a ‘probe’ rather than creating a solution to a problem?
My ‘probe’ is an activity that I have designed to enable me to backtrack the particularities of a cognitive quirk, and hopefully take on a new perspective for qualitative design assessment.
I explore and evaluate the use of people’s recall (and lack there of) under different time pressure for assessing the ‘silence’ of good design, click count, pain points and mental associations.
The ultimate goal would be to find a way to gauge the micro-emotions and reactions users aren’t necessarily conscious enough of to bring up in think-aloud exercises or other traditional qualitative assessment practices user researches deploy.
Some of the background inspirations, concepts, principles, and thoughts that cultivated my interest in memory degradation as a potential probe:
Principle: “…memory for events is strengthened at emotional times” Daniella Schiller. While her work is more focused on how vividly fear connects with retention/recall, it brings up interesting points. It led me to further inquiry, and eventually to Elizabeth Phelp’s work; she combines Neisser’s experiential approach with the neuroscience of emotional memory to explore how such memories work, and why they work the way they do.
On a basic neurobiological level, the connection between emotion and memory is the very close relationship between the amygdala and the hippocampus: Phelps has shown the amygdala “tells our eyes to pay closer attention at moments of heightened emotion. So we look carefully, we study, and we stare — giving the hippocampus a richer set of inputs to work with.” More studies
Emotional memories, stored in the amygdala (situated deep in the temporal lobe, behind the eyes) have a close connection to visuals.
A question to take away is does that closer attention translate if I were to ask people to redraw a new UI they just interacted with? What level of ‘heightened’ emotion would it take for differences to manifest clearly? Recall of a visual experience is also very dependent on how it was initially encoded (breaking the UI down to it’s purpose and functionally driven associations can distort what we think we saw based off our understanding of what we think it does)
Another interesting point that Davachi (another researcher working with Schiller on a study) found that “It turns out that emotion retroactively enhances memory,..Your mind selectively reaches back in time for other, similar things.” This point can have interesting implications in how one might conduct more qualitative interview assessments. Perhaps an emotional trigger or more experiential approach to elicit clearer answers?
Example from everyday life of emotional connection to memory recall:
“A new guy starts working at your company. A week goes by, and you have a few uninteresting interactions. He seems nice enough, but you’re busy and not paying particularly close attention. On Friday, in the elevator, he asks you out. Suddenly, the details of all of your prior encounters resurface and consolidate in your memory. They have retroactively gone from unremarkable to important, and your brain has adjusted accordingly.”
Example from design:
It’s commonly said good design is silent, seamless, and intuitive — it doesn’t make people stop and think, and thus people only tend to notice (or remember) bad design. It makes sense, but how do we verify/disprove it?
If people associate a feature/space/design with a stronger emotion, then they are more likely to remember it. The key underlying hypothesis/assumption — “bad” design creates friction and frustration. What is bad design? I don’t know. I’d like to find out if there are any consistent trends.
For now, I hypothesize that the bad design that catches emotional attention are important features/functionalities that aren’t correctly placed within the information/visual hierarchy that are thus hard to find, ambiguous communication, or simply problematic, yet important features.
Several hypotheses to evaluate:
If the feature/UI element’s interaction aligns with the persons belief of what the apps purpose is, people remember where it is (if not what it looks like)
The collectively most remembered features/elements of the UI align with usage/click count (if so, this method would work as a quick and dirty preliminary method if the actual quantitative data is unavailable)
If the person cannot recall
Anecdotal Think Alouds and Interviews
I ask a series of questions leading up to and following up after a drawing activity.
- What apps, websites, or software would you say you use the most on daily basis? Give me your top 5.
- Ok! Quick, first 5 words or feelings that come to mind in association with this product.
3. Prompt: Draw two screens/scenes from the application, with as many features, and details as you can remember. You can feel free to pick whichever device for whatever screen you feel you’d remember better for. Continue until you can’t recall anything more.
People drew on the templates I had loaded onto my ipad. Most people could only draw for about 2 minutes 28.8 seconds before they feel they couldn’t recall any more detail/info.
Below was the one (and only) interviewee that went until 8 minutes 11 seconds. She is likely more the exception that proves the rule. I found out later that she had been working on a project analyzing facebook stories and so had spent the week reviewing the interface.
Most people’s versions of facebook had a fidelity ranging between the following two:
Immediately after they can no longer recall anything more, I have them color the scene like a ‘heat’ map with their confidence levels; the darkest colors for the placements/features/details they are most sure of and lighter for the ones the are less confident of.
And then I ask them to point out specific details/features they are the most confident in. They rank it from 1–3, most confident to less and I ask them what specifically were they confident about —
4. That’s awesome. Ok can you explain why you think you chose this type of screen?
5. Ok! Thank you. Alright, what would specific purpose or feature from this program you would say you use the most?
6. Do you have a feature you especially life? Why?
7. Do you have a feature/mode you don’t particularly like. Why?
In the current iteration of the protocol, each interview + activity took about 18 minutes. I specifically chose drawing on ProCreate on the tablet because I could later review the timelapse of everyone’s drawings and analyze the order of how they drew things.
Using mech turk to generate mass recall ‘heat’ maps. Larger data set for more reliable correlation extrapolation
Now how do I replicate the above activity but for masses of people?
Create an online ‘puzzle’ with bits and pieces of the website for people to arrange as best they can from memory what they think the website looks like. They are asked to delete anything they aren’t really sure about/can’t remember where or whether it belongs.
Pay mech turkers to try out the activity!
They then upload image results.
Here are some of the results:
I had chosen to go with a puzzle version of the drawing activity for over the web because drawing with a mouse is incredibly frustrating, but this particular method does make making heat map compilations….messier. I will have to rethink this method for efficiency. Because I currently have to go in an analyze each one for what they did or did not include,a dn teh relative accuracy of that. This is not uncommon (much like ‘coding’ video results from a lab experiment to an excel spreadsheet, but it does take longer than I would’ve liked. The visual activity approach I had hoped would allow researchers to draw insight just by observing the resulting heat maps.)
So far though, I’m quite happy with my general pipeline and I think it has a lot of potential. Question and activity protocol for mass responses needs to be refined with more iterations.
Things that would help validate my probe in a measurable metric :
- Click count on a feature
- Follow through on the usage of a feature
- Usage dip between new and old designs
Things to do to further probe my probe for its ability to infer qualitative experiences of the digital space:
- Emotionally driven recall tests (rely on emotion retroactively enhancing the memory/recall of
- More think alouds with highly self-reflective and observant people
- EEG? Except I probably wouldn’t be able to get to the level of detail I want…