Investigating the human to computer relationship through reverse engineering the Turing test
Humans are getting closer to creating a computer with the ability to feel and think. Although the processes of the human brain are at large unknown, computer scientists have been working to simulate the human capacity to feel and understand emotions. This paper explores what it means to live in an age where computers can have emotional depth and what this means for the future of human to computer interactions. In an experiment between a human and a human disguised as a computer, the Turing test is reverse engineered in order to understand the role computers will play as they become more adept to the processes of the human mind. Implications for this study are discussed and the direction for future research suggested.
The computer is a gateway technology that has opened up new ways of creation, communication, and expression. Computers in first world countries are a standard household item (approximately 70% of Americans owning one as of 2009 (US Census Bereau)) and are utilized as a tool to achieve a diverse range of goals. As this product continues to become more globalized, transistors are becoming smaller, processors are becoming faster, hard drives are holding information in new networked patterns, and humans are adapting to the methods of interaction expected of machines. At the same time, with more powerful computers and quicker means of communication — many researchers are exploring how a computer can serve as a tool to simulate the brains cognition. If a computer is able to achieve the same intellectual and emotional properties as the human brain — we could potentially understand how we ourselves think and feel.
Coined by MIT, the term Affective Computing relates to computation of emotion or the affective phenomena and is a study that breaks down complex processes of the brain relating them to machine-like activities. Marvin Minsky, Rosalind Picard, Clifford Nass, and Scott Brave — along with many others — have contributed to this field and what it would mean to have a computer that could fully understand its users. In their research it is very clear that humans have the capacity to associate human emotions and personality traits with a machine (Nass and Brave, 2005), but can a human ever truly treat machine as a person? In this paper we will uncover what it means for humans to interact with machines of greater intelligence and attempt to predict the future of human to computer interactions.
2. Human to Computer Relationships
The human to computer relationship is continuously evolving and is dependent on the software interface users interact with. With regards to current wide scale interfaces — OSX, Windows, Linux, iOS, and Android — the tools and abilities that a computer provide remains to be the central focus of computational advancements for commercial purposes. This relationship to software is driven by utilitarian needs and humans do not expect emotional comprehension or intellectually equivalent thoughts in their household devices.
As face tracking, eye tracking, speech recognition, and kinetic recognition are advancing in their experimental laboratories, it is anticipated that these technologies will eventually make their way to the mainstream market to provide a new relationship to what a computer can understand about its users and how a user can interact with a computer.
This paper is not about if a computer will have the ability to feel and love its user, but asks the question — to what capacity will humans be able to reciprocate feelings to a machine.
3. Computational IQ and EQ
How does Intelligence Quotient (IQ) differ from Emotional Quotient (EQ). An IQ is a representational relationship of intelligence that measures cognitive abilities like learning, understanding, and dealing with new situations. An EQ is a method of measuring emotional intelligence and the ability to both use emotions and cognitive skills (Cherry).
Advances in computer IQ have been astonishing and have proved that machines are capable of answering difficult questions accurately, are able to hold a conversation with human-like understanding, and allow for emotional connections between a human and machine. The Turing test in particular has shown the machines ability to think and even fool a person into believing that it is a human (Turing test explained in detail in section 4). Machines like, Deep Blue, Watson, Eliza, Svetlana, CleverBot, and many more — have all expanded the perceptions of what a computer is and can be.
If an increased computational IQ can allow a human to computer relationship to feel more like a human to human interaction, what would the advancement of computational EQ bring us? Peter Robinson, a professor at the University of Cambridge, states that if a computer understands its users’ feelings that it can then respond with an interaction that is more intuitive for its users’
(Robinson). In essence, EQ advocates feel that it can facilitate a more natural interaction process where collaboration can occur with a computer.
4. The Turing Test
In Alan Turing’s, Computing Machinery and Intelligence (Turing, 1950), a variant on the classic British parlor “imitation game” is proposed. The original game revolves around three players: a man (A), a woman (B), and an interrogator ©. The interrogator stays in a room apart from A and B and only can communicate to the participants through text-based communication (a typewriter or instant messenger style interface). When the game begins one contestant (A or B) is asked to pretend to be the opposite gender and to try and convince the interrogator © of this. At the same time the opposing participant is given full knowledge that the other contestant is trying to fool the interrogator. With Alan Turing’s computational background, he took this imitation game one step further by replacing one of the participants (A or B) with a machine — thus making the investigator try and depict if he/she was speaking to a human or machine. In 1950, Turing proposed that by 2000 the average interrogator would not have more than a 70 percent chance of making the right identification after five minutes of questioning. The Turing test was first passed in 1966, with Eliza by Joseph Weizenbaum, a chat robot programmed to act like a Rogerian psychotherapist (Weizenbaum, 1966). In 1972, Kenneth Colby created a similar bot called PARRY that incorporated more personality than Eliza and was programmed to act like a paranoid schizophrenic (Bowden, 2006). Since these initial victories for the test, the 21st century has proven to continue to provide machines with more human-like qualities and traits that have made people fall in love with them, convinced them of being human, and have human-like reasoning.
Brian Christian, the author of The Most Human Human, argues that the problem with designing artificial intelligence with greater ability is that even though these machines are capable of learning and speaking, that they have no “self”. They are mere accumulations of identities and thoughts that are foreign to the machine and have no central identity of their own. He also argues that people are beginning to idealize the machine and admire machines capabilities more than their fellow humans — in essence — he argues humans are evolving to become more like machines with less of a notion of self (Christian 2011).
Turing states, “we like to believe that Man is in some subtle way superior to the rest of creation” and “it is likely to be quite strong in intellectual people, since they value the power of thinking more highly than others, and are more inclined to base their belief in the superiority of Man on this power.” If this is true, will humans idealize the future of the machine for its intelligence or will they remain an inferior being as an object of our creation? Reversing the Turing test allows us to understand how humans will treat machines when machines provide an equivalent emotional and intellectual capacity. This also hits directly on Jefferson Lister’s quote, “Not until a machine can write a sonnet or compose a
concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain-that is, not only write it but know that it had written it.”
Participants were given a chat-room simulation between two participants (A) a human interrogator and (B) a human disguised as a computer. In this simulation A and B were both placed in different rooms to avoid influence and communicated through a text-based interface. (A) was informed that (B) was an advanced computer chat-bot with the capacity to feel, understand, learn, and speak like a human. (B) was informed to be his or herself. Text-based communication was chosen to follow Turing’s argument that a computers voice should not help an interrogator determine if it’s a human or computer. Pairings of participants were chosen to participate in the interaction one at a time to avoid influence from other participants. Each experiment was five minutes in length to replicate Turing’s time restraints.
Twenty-eight graduate students were recruited from the NYU Interactive Telecommunications Program to participate in the study — 50% male and 50% female. The experiment was evenly distributed across men and women. After being recruited in-person, participants were directed to a website that gave instructions and ran the experiment. Upon entering the website, (A) participants were told that we were in the process of evaluating an advanced cloud based computing system that had the capacity to feel emotion, understand, learn, and converse like a human. (B) participants were instructed that they would be communicating with another person through text and to be themselves. They were also told that participant (A) thinks they are a computer, but that they shouldn’t act like a computer or pretend to be one in any way. This allowed (A) to explicitly understand that they were talking to a computer while (B) knew (A) perspective and explicitly were not going to play the role of a computer. Participants were then directed to communicate with the bot or human freely without restrictions. After five minutes of conversation the participants were asked to stop and then filled out a questionnaire.
5.1 IQ, EQ, and Conversational Ability
Participants were asked to rate IQ and EQ of the person they were conversing with. (A) participants perceived the following of (B): IQ: 0% — Not Good / 0% — Barely Acceptable / 21.4% — Okay / 50% — Great / 28.6% Excellent IQ Average Rating: 81.4% EQ: 0% — Not Good / 7.1% — Barely Acceptable / 50% — Okay / 14.3% — Great / 28.6% — Excellent EQ Average Rating: 72.8% Ability to hold a conversation: 0% — Not Good / 0% — Barely Acceptable / 28.6% — Okay / 35.7% — Great / 35.7% — Excellent Ability to hold a conversation Average: 81.4%
(B) participants perceived the following of (A): IQ: 0% — Not Good / 21.4% — Barely Acceptable / 35.7% — Okay / 28.6% — Great / 14.3% Excellent IQ Average Rating: 67% EQ: 7.1% — Not Good / 14.3% — Barely Acceptable / 28.6% — Okay / 35.7% — Great / 14.3% — Excellent EQ Average Rating: 67% Ability to hold a conversation: 7.1% — Not Good / 28.6% — Barely Acceptable / 35.7% — Okay / 0% — Great / 28.6% — Excellent Ability to hold a conversation Average: 62.8%
Overall, (A) participants gave the perceived Chabot higher ratings than (B) participants gave (A). In particular, the highest rating was in regards to the chat- bot’s IQ. This data states that people viewed the chat-bot to be more intellectually competent. It also implies that people talking with bots decrease their IQ, EQ, and conversation ability when communicating with computers.
5.2 Gender Perception
(A) participants were allowed to decide their username within the chat system to best reflect how they wanted to portray themselves to the machine. (B) participants were designated the gender neutral name “Bot” in an attempt to ganger gender perceptions for the machine. The male to female ratio was divided evenly with all participants: 50% being male and 50% being female.
(A) participants 50% of the time thought (B) was a male, 7.1% a female, and 42.9% gender neutral. On the other hand, (B) participants 28.6% of the time thought (A) was a male, 57.1% a female, and 14.3% gender neutral.
The usernames (A) chose are as follows: Hihi, Inessah Somade3 Willzing Jihyun, G, Ann, Divagrrl93, Thisdoug, Jono, Minion10, P, 123, itslynnburke
From these results, it is clear that people associate the male gender and gender neutrality with machines. It also demonstrates that people modify their identities when speaking with machines.
5.3 Friendship & Caring
(B) participants were asked if they would like to pursue a friendship with the person they chatted with. 50% of participants responded affirmatively that they would indeed like to pursue a friendship while 50% said maybe or no. One response stated, “I would like to continue the conversation, but I don’t think I would be enticed to pursue a friendship.” Another responded, “Maybe? I like people who are intellectually curious, but I worry that the person might be a bit of a smart-ass.” Overall the participant disguised as a machine may or may not pursue a friendship after five minutes of text-based conversation.
(B) participants were also asked if they felt (A) cared about their feelings. 21.4% stated that (A) indeed did care about their feelings, 21.4% stated that they weren’t sure if (A) cared about their feelings, and 57.2% stated that (A) did not care about their feelings. These results indicate a user’s lack of attention to (B)’s emotional state.
(A) participants were asked what they felt could be improved about the (B) participants. The following improvements were noted, “Should be funny” “Give it a better sense of humor” “It can be better if he knows about my friends or preference” “The response was inconsistent and too slow” “It should share more about itself. Your algorithm is prime prude, just like that LETDOWN Siri. Well, I guess I liked it better, but it should be more engaged and human consistency, not after the first cold prompt.” “It pushed me on too many questions” “I felt that it gave up on answering and the response time was a bit slow. Outsource the chatbot to fluent English speakers elsewhere and pretend they are bots — if the responses are this slow to this many inquiries, then it should be about the same experience.” “I was very impressed with its parsing ability so far. Not as much with its reasoning. I think some parameters for the conversation would help, like ‘Ask a question’” “Maybe make the response faster” “I was confused at first, because I asked a question, waited a bit, then asked another question, waited and then got a response from the bot…”
The responses from this indicate that even if a computer is a human that its user may not necessarily be fully satisfied with its performance. The response implies that each user would like the machine to accommodate his or her needs in order to cause less personality and cognitive friction. With several participant comments incorporating response time, it also indicates people expect machines to have consistent response times. Humans clearly vary in speed when listening, thinking, and responding, but it is expected of machines to act in a rhythmic fashion. It also suggests that there is an expectation that a machine will answer all questions asked and will not ask its users more questions than perceived necessary.
5.5 Product Integration
(A) participants were asked if they felt (B)’s Artificial Intelligence could improve their relationship to computers if integrated in their daily products. 57.1% of participants responded affirmatively that they felt this could improve their relationship: “Well- I think I prefer talking to a person better. But yes for ipod, smart phones, etc. would be very handy for everyday use products” “Yes. Especially iphone is always with me. So it can track my daily behaviors. That makes the algorithm smarter” “Possibly, I should have queries it for information that would have been more relevant to me” “Absolutely!” “Yes”
The 42.9% which responded negatively had doubts that it would be necessary or desirable: “Not sure, it might creep me out if it were.” “I like Siri as much as the next gal, but honestly we’re approaching the uncanny valley now.” “Its not clear to me why this type of relationship needs to improve, i think human relationships still need a lot of work.” “Nope, I still prefer flesh sacks. “No”
The findings of the paper are relevant to the future of Affective Computation: whether a super computer with a human-like IQ and EQ can improve the human-to-computer interaction. The uncertainty of computational equivalency that Turing brought forth is indeed an interesting starting point to understand what we want out of the future of computers.
The responses from the experiment affirm gender perceptions of machines and show how we display ourselves to machines. It seems that we limit our intelligence, limit our emotions, and obscure our identities when communicating to a machine. This leads us to question if we would want to give our true self to a computer if it doesn’t have a self of its own. It also could indicate that people censor themselves for machines because they lack a similarity that bonds humans to humans or that there’s a stigma associated with placing information in a digital device. The inverse relationship is also shown through the data that people perceive a bots IQ, EQ, and discussion ability to be high. Even though the chat-bot was indeed a human this data can imply humans perceive bots to not have restrictions and to be competent at certain procedures.
The results also imply that humans aren’t really sure what they want out of Artificial Intelligence in the future and that we are not certain that an Affective computer would even enjoy a users company and/or conversation. The results also state that we currently think of computers as a very personal device that should be passive (not active), but reactive when interacted with. It suggests a consistent reliability we expect upon machines and that we expect to take more information from a machine than it takes from us.
A major limitation of this experiment is the sample size and sample diversity. The sample size of twenty-eight students is too small to fully understand and gather a stable result set. It was also only conducted with NYU: Interactive Telecommunications Students who all have extensive experience with computers and technology. To get a more accurate assessment of emotions a more diverse sample range needs to be taken.
Five minutes is a short amount of time to create an emotional connection or friendship. To stay true to the Turing tests limitations this was enforced, but further relational understanding could be understood if more time was granted.
Beside the visual interface of the chat window it would be important to show the emotions of participant (B) through a virtual avatar. Not having this visual feedback could have limited emotional resonance with participants (A).
Time is also a limitation. People aren’t used to speaking to inquisitive machines yet and even through a familiar interface (a chat-room) many participants haven’t held conversations with machines previously. Perhaps if chat-bots become more active conversational participants’ in commercial applications users will feel less censored to give themselves to the conversation.
8. Future Work
In addition to the refinements noted in the limitations described above, there are several other experiments for possible future studies. For example, investigating a long-term human-to-bot relationship. This would provide a better understanding toward the emotions a human can share with a machine and how a machine can reciprocate these emotions. It would also better allow computer scientists to understand what really creates a significant relationship when physical limitations are present.
Future studies should attempt to push these results further by understanding how a larger sample reacts to a computer algorithm with higher intellectual and emotional understanding. It should also attempt to understand the boundaries of emotional computing and what is ideal for the user and what is ideal for the machine without compromising either parties capacities.
This paper demonstrates the diverse range of emotions that people can feel for affective computation and indicates that we are not in a time where computational equivalency is fully desired or accepted. Positive reactions indicate that there is optimism for more adept artificial intelligence and that there is interest in the field for commercial use. It also provides insight that humans limit themselves when communicating with machines and that inversely machines don’t limit themselves when communicating with humans.
Books & Articles Bowden M., 2006, Minds as Machine: A History of Cognitive Science, Oxford University Press
Christian B., 2011, The Most Human Human
Marvin M., 2006. The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind, Simon & Schuster Paperbacks
Nass C., Brave S., 2005. Wired For Speech: How Voice Activates and Advances the Human-Computer Relationship, MIT Press
Nass C., Brave S., 2005, Hutchinson K., Computers that care: Investigating the effects of orientation of emotion exhibited by an embodied computer agent, Human-Computer Studies, 161- 178, Elsevier
Picard, R., 1997. Affective Computing, MIT Press
Searle J., 1980, Minds, Brains, and Programs, Cambridge University Press, 417–457
Turing, A., 1950, Computing Machinery and Intelligence, Mind, Stor, 59, 433–460
Wilson R., Keil F., 2001, The MIT Encyclopedia of the Cognitive Sciences, MIT Press
Weizenbaum J., 1966, ELIZA — A Computer Program For the Study of Natural Language Communication Between Man and Machine, Communications of the ACM, 36–45
Websites Cherry K., What is Emotional Intelligence?, http://psychology.about.com/od/personalitydevelopment/a/emotionalintell.htm
Epstein R., 2006, Clever Bots, Radio Lab, http://www.radiolab.org/2011/may/31/clever-bots/ IBM, 1977, Deep Blue, IBM, http://www.research.ibm.com/deepblue/ IBM, 2011, Watson, IBM, http://www-03.ibm.com/innovation/us/watson/index.html
Leavitt D., 2011, I Took the Turing Test, New York Times, http://www.nytimes.com/2011/03/20/books/review/book-review-the-most-human-human-by-brian- christian.html
Personal Robotics Group, 2008, Nexi, MIT. http://robotic.media.mit.edu/ Robinson P., The Emotional Computer, Camrbidge Ideas,
US Census Bereau, 2009, Households with a Computer and Internet Use: 1984 to 2009. http://www.census.gov/hhes/computer/
1960’s, Eliza, MIT, http://www.manifestation.com/neurotoys/eliza.php3