The Personality of Information

Investigating the Underlying Social, Institutional and Emotional Frameworks of Human-Computer Friendship Vs. Servitude

Through what frameworks does a human interpret a machine, and through what frameworks does a machine interpret a human?

This analysis more specifically seeks to address the social, psychological and physiological drivers of visual, verbal and hybridized interfaces and how they establish unique precedents for human-computer dialogues. Through my research, I collected insight on the characteristic affordances and limitations of text-graphical and voice-driven search engines and what conversational frameworks people employ to obtain information. I also performed historical and cultural research on the development of voice-driven interactions and it’s roots in science fiction and constructed notions of gender and personality.

Whenever we open our phones, surf the World Wide Web or gaze aimlessly into the television set, we are unlocking gateways to infinitely vast and complex aggregates of information — from minute-to-minute updates on celebrity pregnancies, to the weather on mars, to personal information about our behaviors, communities and lifestyles. Never before have we experienced the kind of autonomy via widespread access to data as we have today, and never before has it felt so easy. That’s because day in and day out, designers and engineers work behind the scenes to craft the critical touch points for these information architectures- the user interface.

When dealing with enormous and complex sets of data, not only are users tasked with interpreting content, but also with navigating it. As a general rule of thumb, the simpler and more clear cut an interface platform is, the easier someone can surface and act upon the content inside of it. And consequentially, the efficacy and durability of an interface carries high stakes when it comes to impacting the intellectual and emotional performance of its user. Haya Levin, a professor at the Centre for Informatics, Beit-Berl College, Israel states that her research supports the hypothesis that the style in which an interface displays information has “great influence on the learning process.” (Levin 64)

“The only “intuitive” interface is the nipple. After that it’s all learned.”- Bruce Edgier

Depending on the context, interfaces must be ready to respond to the psychological and physiological conditionings of its audience, just as the audience must behave in accordance to the established paradigms of the interface. This anticipatory methodology of designing informational frameworks for people requires an understanding of the emotional and cognitive frameworks we already possess when interacting with things; things consisting of both living and digital entities.

Siri’s app on-boarding suggestions for first-time users

Visual and Verbal Paradigms of Communication

Traditionally, interfaces have been comprised of tandem visual and verbal cues, combining both graphic and text elements when one or the other would not suffice alone in achieving a clear communication. However, in recent years, voice-assisted and voice-driven interfaces have become major players in the design of commercial product interfaces. In fact, the chances of speaking to an interactive voice response (IVR) system, for instance, when calling to book a flight or getting in touch with customer service are far greater than speaking to an actual human being.

Text Only

  • Results are all text-based, a clear visual hierarchy is established across information types based on relevance.

Text + Graphics

  • Currently the most common genre of result from an internet query
  • Search by word, image, pin-drop and/or voice
  • Ability to refine results with filters and keywords
  • Text results are often supplemented by images or the images are supplemented by text
Google word search query, Google image search, Google map view and word search query on Youtube
IOS 8 Iphone search surfaces both images and text attributed to your query.

Text + Audio

  • Mainly a smartphone-only experience.
  • Most commonly occurs in response to open-ended questions with varied, subjective answers
  • Once a queue of text previews is surface, the voice interface will cease to converse. It’s up to the user to select a result or ask another question.

Text + Audio + Graphics

  • Combined functionality more commonly hosted on mobile devices
  • Use both voice and touch to surface and act upon information

Audio Only

  • No visual interface- all information is represented as words and sounds
  • Hands free, aside from a few physical touch points for basic controls

Testing a Hypothesis

In a paper titled Wired for Speech: How Voice Activates and Advances the Human–Computer Relationship, Clifford Nass, a Stanford professor of computational linguistics notes how, “spoken dialogue systems have received increased interest because they are potentially much more natural and powerful methods of communicating with machines than are current graphics-based interfaces” (Nass 451). Voice-driven interfaces have made complex technologies and information systems accessible for people of many age demographics and bodily agency, by posing an intuitive and naturally possessed skill (human speech) as the means of control.

Formatting an Interview

I sought to test Nass’s hypothesis affirming the intuitiveness of the verbal medium by observing the ways in which people are able to access information on text-based versus voice-driven mediums:

  • How did they structure their query (grammatically, stylistically)?
  • At what points in the question posing timeline was relevant vs. irrelevant information surfaced on each platform?
  • How many points of user-computer information exchange occurred before a satisfying answer was obtained?
  • How did users react emotionally and physiologically to successful vs. unsuccessful query results? How did this reactions interrupt or continue the dialogue?

Interview Structure

Platforms Tested

  • Google Search Bar
  • Apple’s Siri

Participants

  • The participants are veteran users of both platforms

Materials Collected

  • Screenshot/photograph of browser + typed search bar query
  • Screenshot of Siri dialogue on iphone
  • Screenshot of search history

Question Part 1

“What was the last thing you searched for?”

“If you can remember, would you please try searching for it again in the google search bar?”

“If you can’t remember, would you please ask any question you want an answer to right now?”

Question Part 2

“Please find the same answer, this time using Siri.”

Interview Results

Participant A — Female, Graphic Design Student

Participant A was doing a project on sentiment analysis with emojis. Their stated intent was to google an image of an emoji they would be putting in a video.

Search Bar Observations:

  • A key-worded query: lacks syntax and utilizes specific terminology to reference an image
  • The query locates a known object in the internet space

Voice Interface Observations:

  • Query results only surface the article links that are surfaced from the search bar. At this point, Siri is still “listening” but remains quiet. The only actionable items are the surfaced article links.

Participant B —Male, Product Design Student

When asked about the most recent query they made, Participant B asked if a Youtube query would count. Before this interview, I hadn’t thought about juxtaposing results from more than the two selected platforms (Google’s Search and Apple’s Siri) because my preliminary research found that Google Search and Siri were respectively the most frequently used platforms of each interface genre. I decided it would be kind of interesting to investigate Youtube’s branded search (as it is the second most frequented search tool, right below Google).

Search bar Observations:

  • Participant B only had to search the name of the artist they wanted to listen to in order to surface the video they wanted.
  • Like Participant A, this is another account of a premeditated search, where the user is knowingly locating a piece of information

Voice Interface Observations:

  • Participant B was able to successfully search and find the subject of their query.
  • They were able to identify the video they were looking for based on the video thumbnails Siri provided.

Participant C — Female, Graphic Design Student

Participant C was searching for an artist they really enjoyed, but couldn’t remember the name of. This was one of the first scenarios where the user’s query was directed as a perusing/scanning tool, where they could sift through lots of information at once in order to find a specific piece.

Search bar Observations:

  • Participant C typed in the url of a popular artistic network site. The search bar autocompleted the query and they were able to press enter before completing the url. They were lead directly to the site.

Voice Interface Observations:

  • The first time Participant C inquired Siri, there was a miscommunication due to surrounding noise and delayed wording so only the first part of the query was registered.
  • The second time Participant C asked the question, the full sentence was input but “Dribbble” was registered incorrectly, a spelling with only two “b”’s instead of three. Because of this spelling error, Siri interpreted the sentence as a personal request for them to go “dribble”.
  • The third and fourth time the question was asked, Siri misheard the statements and brought the Participant to the wrong url. They became visibly frustrated.
  • It was only until the Participant spelled out the URL for Siri to hear that they were brought to the correct location.
  • Ultimately they were unable to find the artist they were looking for and went back to the Search Bar to investigate futher.

Participant D —Male, Environments Design Student

Participant E — Female, Environments Student

Key Insights

  • Despite advances in the design of voice-driven interfaces, there is still a steep learning curve for both the human and interface that must be overcome before being able to have emotionally and temporally efficient, fulfilling interface dialogues.
  • Familiarizing oneself with the norms of conversation with a voice-driven platform can be matter of trial and error, an effort impaired by speech interpretation errors and limited visibility of relevant information.

Historical and Modern Cultural Analysis of Talking AI

Screenshot from the film, “Her”

Of course, spoken dialogue with machines is no new concept. C3PO from Star Wars, Samantha the romantic lead from Spike Jonze’s Her, Rosie the Robot Maid from The Jetsons and many more fictitious artificial entities have all come to represent the desires and fears we bear for the future interactions we have with our personal devices. Science fiction in this way has become a cultural propagator of speculative design in the realm of voice interfaces.

There exist countless parallels between the frivolous conversational machines we envisioned years ago and what has actually emerged in the modern tech market. The robotic butlers and stewardesses from the movies are now inside of every smartphone on earth. One of the decade’s most iconic “stewardesses” noted here is Apple’s Siri. Many IPhone users can recall the magic moment they first uttered the phrase, “Hey, Siri!” back in 2010 following the release of a new Apple software update. Siri’s reception was profoundly radical. But it was not only the novelty of this interactive genre that captured users’ attention — it was Siri’s distinct voice and personality. While core designers and programmers of the Siri Team were at liberty to employ virtually any combination of pitches, timbres and accents, they made a deliberate choice to encode Siri with a deep, briskly efficient and unmistakably feminine voice (in the U.S. and four other countries that is. In the United Kingdom and France, Siri is represented as male.) While other iterations of gender and nationality exist as programmable options on the IPhone (and only exist on the IPhone, compared to the initial releases of Google’s Assistant, Amazon’s Alexa and Samsung’s Bixby), the defaulted, caucasian American female voice is the most commonly recognized and utilized setting among them. Today, users have come to understand and describe Siri almost exclusively using female pronouns. Her gender has even prompted some users to ask Siri sexually explicit questions like “What are you wearing?” (to which Siri unfailingly responds, “Why do people keep asking me this?”).

Why do people keep asking her this? The answer may lie in the ways in which people tend to establish relationships with digital representations of personalities and gender. Nass, who has conducted extensive research on how qualities of the human voice set precedents for human-computer interaction, notes: “Female voices are seen, on average, as less intelligent than male voices. It’s safer in a sense to have a male voice in the sense that you’re not going to disappoint people as much.” (Nass) In fact, even the contexts where we can observe female voices to exist are limited, as male voices are more often chosen over females to be the stewards of more robust and reliable services, such as the Apple Support Hotline and the United Airlines hotline. So why did Apple make Siri’s voice female? And what contexts does the female voice most “comfortably” occupy?

A study conducted by Indiana University professor Karl MacDorman studying social desirability bias in human-computer interaction observed 485 participants (151 men and 334 women) who listened to male and female synthesized voices. Both groups reported that the female voices sounded warmer, provoking ethical quandaries among interface designers about how to manage gendered user impressions. Companies that produce the automated voices for their products cast actors to do voice recordings and gauge their quality based on attributes of warmth, friendliness and competence, which the female voice predominantly checks off all the boxes for.

A physiological driver for the widespread use of the female voice is tonal contrast. Sonically, female and male voices have distinctive traits that make them appropriate for different environmental contexts. For instance, BART shuttles in the San Francisco area, airplane telecoms, and many other public transportation systems use the female voice because the higher pitch is registered above the noise of loud passing vehicles.

A third circumstance that establishes a cultural preference for the female voice over a male one is the representation of gendered voices in the media. The homicidal HAL from Stanley Kubrick’s 2001: A Space Odyssey spurred so much fear in audiences at the time of the film’s release that tech companies kept at bay from associating their product with overly assertive and suave male voices. The “autonomous” machine in media, assigned predominantly masculine traits, is a force we are implicitly taught to be wary of, (check out Asimov’s laws of robotics). Subservient talking machines, like Plankton’s computer wife, “Karen” from Spongebob or “Ava” from Ex Machina, are much more palatable, non-threatening, and almost exclusively female.

Where did these cultural attitudes originate?

In order to understand the motivations of designers, developers and strategists behind designing gendered and persona-driven products, we must first take a look at historic interpretations of gender in American design.

Traditional, mid-century gender roles have influenced product development and branding through time.

Image from Women’s Home Journal

Companies have been redesigning the American female cultural ideal for decades. Popular media of the mid-century like Women’s Home Journal exemplify the widespread commercialization of gendered technology that swept the nation following World War 1. Advertisements for washers, dryers, stoves, toaster ovens, vacuums and irons enthusiastically targeted the newly-made housewives of the post-war era, boasting accommodating features that enhanced the efficiency, quality and creativity of their housekeeping.

Image from Women’s Home Journal

These highly tailored product campaigns transformed attitudes surrounding “women’s work” at the time. In “The “Industrial Revolution” in the Home: Household Technology and Social Change in the 20th Century”, American historian Ruth Schwartz Cowan examines female pre and post World War I ideological relationships with domestic labor. While women prior to WWI typically delegated house chores to hired servants, observing these tasks as mere necessities, women of the post-war era harbored a strong emotional obligation to performing housework themselves. Transformed from pure necessity, household upkeep became like a performance, a labor of love that would encourage feelings of loyalty and affection between members of the family. The union of these emotional and physical labors brought with it novel implications for the role of the housewife; if this role were performed badly or not performed at all, a woman was to be resented by the whole family. The hypercharged, emotional advertising of home technologies we can observe throughout the 1930’s-1950’s are reflective of this shift in gendered labor, which was practically the locus of their pitch. As the role of housewife transformed into one of willful servitude, a growing demand for efficient and high quality appliances to supplement this role strengthened the electronics market.

If we compare the attitudes fueling gendered product advertising from the Post-War era to today… not too much has changed. Surely we see fewer explicit representations of chipper women donning polka-dot aprons, beaming across the television screen, but it seems that contemporary advertisers have merely exchanged the apron for other more culturally temperate motifs. An add released in 2013 for the Samsung 840 WVO Solid-State Drive came pretty close to this, depicting a housewife in a kitchen who giddily exclaims that she uses her laptop to “look at pictures or videos of my children from family trips, use the internet, and help my children with their homework. And that’s about it.” Her male counterpart is depicted managing more masculine issues, such as backing up data and transferring office files. The housewife goes on to express her confusion as to what the product even is, or how to use it.

Samsung in particular has run into other gender-related blunders in their ad campaigns, and more notably at the release of their voice assistant, Bixby. Listed as tags under “language and speaking style” in the Bixby menu are a series of descriptors for the female and male voices. The descriptors they chose, which are in essence meant to summarize the personality cues of each vocal framework, casually reinforce stereotypical notions of gender expression.

After spurring dissent across social media and online forums, with users expressing their concern with the kind of diminutive, stereotypical language being used across this platform, Samsung promptly removed the tags and apologized for their behavior. While this may have been a sub-conscious blunder on their part, companies such as Samsung clearly borrow from the gender canon to market their products in hopes of establishing a stronger and more naturally occurring bond with the user.

It’s actually not a bad idea to design personality frameworks for conversational interfaces. People want to interact with computers/complex systems using models of behavior they are already familiar with. Personality is a reference point for us to judge the quality of a person or entity we are interacting with, so it is natural for a user to anthropomorphize qualities of the interface they are using. However, a problem emerges when brands and corporations construct product personalities that are heavily informed by stereotypical notions of gender, which strengthen those gender mandates in the interactions we have with people in our lives. Failing to design a “responsible” personality framework across an interface only propagates existing inequalities among genders and undermines the social responsibility designers bear.

Another issue of designing product personalities is that of the discrepancies in addressing inanimate devices as humans.

Works Cited

Passig, D. David, and H. Levin. “Gender Preferences for Multimedia Interfaces.” Journal of Computer Assisted Learning (2000) 16, 64–71, 15 Apr. 1999, citeseerx.ist.psu.edu/viewdoc/download?doi.

Colley, A., Hill, F., Hill, J. & Jones, A. (1995) Gender Effects in the Stereotyping of Those with Different Kinds of Computing Experience. Journal of Educational Computing Research, 12, 1, 19ñ27.

Lucas, L. (1991) Visually Designing the Computer-Learner Interface. Educational Technology, July. 56–58.

Nass, Clifford. Wired for Speech: How Voice Activates and Advances the Human–Computer Relationship. www.aclweb.org/anthology/J06-3009.

Cowan, Ruth Schwartz. “The ‘Industrial Revolution’ in the Home: Household Technology and Social Change in the 20th Century” Technology and Culture​ 17,no. 1 (1976):

Dailymail.com, Stacy Liberatore For. “Why Al Assistants Are Usually Women: Researchers Find Both Sexes Find Them Warmer and More Understanding.” Daily Mail Online, Associated Newspapers, 28 Feb. 2017, www.dailymail.co.uk/sciencetech/article-4258122/Experts-reveal-voice-assistants-female-voices.html#comments.

“The Gender of Artificial Intelligence.” CrowdFlower, 30 Nov. 2017, www.crowdflower.com/the-gender-of-ai/.

Samsung 840 EVO Series Solid State Drive

https://www.youtube.com/watch?time_continue=1&v=-y3XuhMJQ28

Deahl, Dani. “Samsung Adds and Swiftly Removes Sexist Bixby Descriptor Tags.” The Verge, The Verge, 19 July 2017, www.theverge.com/2017/7/19/15998668/samsung-adds-removes-sexist-bixby-descriptor-tags.

Mendoza, E, et al. “Differences in Voice Quality between Men and Women: Use of the Long-Term Average Spectrum (LTAS).” Journal of Voice : Official Journal of the Voice Foundation., U.S. National Library of Medicine, Mar. 1996, www.ncbi.nlm.nih.gov/pubmed/8653179.