Doing With Theory
Conversational Symbiosis/Symbioses Amongst Humans and Artificial Agents in the Context of Intimate Relationships Seminar III Project
For my Seminar III project, I wanted to acquire an understanding of the different models humans create of artificial agents. I believe a better understanding of these models will allow me to create more reputable visions for the future.
To do this, I developed a way for participants to generate qualitative descriptions of artificial agents/intelligence within contemporary artifacts (e.g. Alexa and Google Search).
For the purposes of this project, I developed a two stage study. The first stage included two conversations with artificial agents, while the second stage included a mapping activity.
It took some time to come up with what the actual conversations would be about, but I eventually settled on the idea that the activity should be focused on information retrieval.
Based on that decision, I decided that a user should try to acquire as much information as possible about the movie The Mighty Ducks (note: I initially decided that it would be about D2: The Mighty Ducks, but Alexa does not understand the phrase D2.). I decided on The Mighty Ducks, because it is a movie that people are semi-familiar with (at least my participants were), but do not have high levels of familiarity with. This way the conversations with agents would be more exploratory in nature.
With that finalized the subject of the study finalized, I laid out the following protocol for a participant.
As a participant, learn as much as you can about The Mighty Ducks in two minutes.
For the first conversation, use the Google web search engine.
For the second conversation, use an Amazon Alexa.
After you complete the conversations, I will construct maps for the second stage of this activity.
Construct two maps, one for your conversation with the web search engine and one for your conversation with Alexa. I will provide you with:
- printed representations of what you typed or said and what you received back
- rationale indicators, a place to provide the basis for your action
- interpretation indicators, a place to explain your understanding
- adjective indicators, a place for quick reflection, quickly write down a couple adjectives to describe your experience
I explored the idea of adding emotion dots (i.e. placed where you felt a certain emotion), ambiguity cards (i.e. a place for questions that you wish you could ask the interface), and alternative cards (i.e. a place for other actions you considered). But, eventually decided not to add these cards because I felt the responses I would receive back would most likely also be included in the rationale, interoperation, and adjective indicators.
Running The Study
I ran the study with five participants over the course of 4 days and received back 10 maps to further analyze.
During the activity, I confronted a number of challenges and obstacles including the delay in between activities (It took me about 30 minutes to generate the initial map for the second activity. By the time, I was done generating the map, the participant was busy doing something else. In most situations, I would need to wait till the next day to complete the activity. I am not sure if this delay negatively affected the study), my decision to have participant’s complete this activity on paper versus orally through a speak-aloud (I made the explicit decision to have this activity not be a speak-aloud and instead a written activity. I hoped that this would allow me to receive more qualitative responses. Did this occur, I am not so sure?). It would be interesting to see if any of these factors positively or negatively influenced the quality of reflection I received back from participants.
I learned a number of things from this study. These insights included:
- Participants saw Google as having many strong connections and as a source that could easily link them to other sources. While, Alexa has a few weak connections and is largely seen as a sole entity.
- Web searches allowed participants to create their own context, whether that be through the use of tabs (It would be interesting to understand more about why some users use tabs and others do not) or Google searches specific to a certain site. While Alexa, had no such mechanisms in place.
Both these factors, played a significant role in the models participants created of the two systems. While using Google a participants search focused over time, the complete opposite occurred when interacting with an Amazon Alexa where searches expanded over time.
This ultimately led users to see a Google search as an expansive, logical, and intuitive experience, while an interaction with an Amazon Alexa as a specific, confusing, and frustrating experience.
All of these factors ultimately effected a user’s:
- conception of speed (they saw Alexa as faster initially, primarily because of voice, but slower over time)
- perceived effort (they saw an experience with Google as instinctual and intuitive, while an experience with Alexa as labored)
- sense of state (users knew if they were getting closer to the answers they wanted with Google, but had no sense of that with Alexa)
- sense of control/patience (users never knew when Alexa was going to finish speaking to them, eventually leading to a loss of patience)
- testing of boundaries (users felt the need to test the boundaries with Alexa, they did not with Google)
This activity brought up a number of questions that would be worthwhile to look at in other studies.
- How can VUIs capture the needed context to overcome their current drawbacks?
- Will interaction with a VUI ever become familiar? Is that predicated on an intelligence explosion?
After completing this study, I now have a deeper understanding of the significant role both context and voice play in how users perceive an artificial agent. Below you can find a number of questions I plan to consider (over the next couple of months) based on this study.
Specific to Context
- What features can be introduced to create a sense of order?
- How can an agent visualize its confidence in an answer or its uncertainty in a phrase just spoken/written by a user?
Specific to Voice
- Are multiple voices needed for a user to perceive a varied understanding?
- Will knowledge always be shared through multiple voices or can agents overcome thousands of years of history?
- How can users interact with multiple voices through one experience?