Lexical and Syntactic Alignment in Human-Computer and Second Language Dialogue

Abstract

Advanced artificial dialogue system has made it increasingly important to understand the factors that impact linguistic behavior between human and computer. This paper reported an experiment that used modified picture-matching and -naming tasks, in which participants selected the name of the object in picture and name it after listening to a paragraph.
The 2x2 study compared human versus computer dialogue coupled with Chinese versus English context. There is no significant difference of alignment effect in the four conditions. Trustworthiness and English proficiency has no distinct relationship with the alignment. However, the alignment effect in the condition of computer voice in English may reveal its value as second language leaning tool.


Introduction

With the growth of artificial dialogue interaction system and Chatbot, conversations happen not only between human and human, but also human and computer. As the mainstream technology products such as Siri, Google Assistant and Cortana become increasingly popular, the frequency of Human- Computer Dialogue (HCD) grows rapidly. Users use natural speech to command or query systems and get text or voice feedback. In this circumstance, human and computer communicate via a dialogue. A dialogue is a collaborative activity where the two or more interlocutors exchange information and develop similar mental state. There has been plenty of studies about the processing of information in a dialogue. One interesting question is to which extent the beliefs about the mental state of an interlocutor would affect a speaker’s behavior. People tend to mutually align to interlocutors’ vocabulary, sentence structure and other features in dialog. Brennan & Clark (1996) suggested that when interlocutors establish a sharing conceptualization, or conceptual pact, they tend to use the same referring expressions, namely lexical alignment. Garrod & Anderson (1987) also found input-output coordination in series of dialogues. Interlocutors reapplied their output based on the lexical and syntactic rules from input. Actually, alignment is a strategy that people use to facilitate mutual understanding. There has been much literature focusing on alignment in human-human dialogue (HHD). However, in terms of interactions between human and computer, little is known about spoken dialogues in HCD. With this in mind, researchers from psycholinguistic and human-computer inter- action (HCI) show growing interest in the communication of human and artificial media. Although there has been some studies in this area, research was mostly in English as a native language. From the perspective of language learning, HCD may more frequently happen between computer and non- native speakers (L2). It is of high practical value to explore

L2’s linguistic behavior when communicating with computers to give some implications for human-computer interaction and language learning system for designers and developers. This study aims to explore the factors that impact lexical and syntactic alignment in HCD in the context of Chinese and second language context via series of modified picture-matching and -naming tasks.

Lexical Alignment in HCD

In the recent 20 years some researchers have made experiments to investigate how people change their lexical and syntactic output in the human-computer dialogue scenarios (Branigan, Pickering, Pearson, McLean, & Nass, 2003; Branigan, Pickering, Pearson, McLean, & Brown, 2011; Cowan, Branigan, & Beale, 2012; Cowan, Branigan, Obregón, Bugis, & Beale, 2015). Early research focused on comparing the extent alignment happens in HHD and HCD. Branigan et al. (2003)’s research shows that the alignment exist in both situation, and that in HCD is even more prominent than in HHD. They conducted a picture-matching game, in which participants communicated with an unseen interlocutor by typing on computer. The interlocutor gave a description of images using text for the other to match the pictures on the screen. Then the participants were required to produce description of the picture. In one condition, participants were led to believe that they were interacting with a computer, while in the other they were told to communicate with a human being. The striking findings show that participants tend to align linguistically with what they believe to be a computer. Subsequent studies also highlighted users’ perceptions on system capabilities. For example, in Branigan et al. (2011) ’s research, users were told they were interacting with either a “basic” or “advance” computer when playing a picture matching game with a computer interlocutor. Results showed that the alignment was significantly larger in the basic condition, which indicated that interlocutor’s characteristic and perceived communicative capability would impact linguistic behavior.

Later, with the maturity of artificial intelligent and speech recognition, researchers directed to HCD via speech, expanding the research area from text- based dialogue. Bigot et al. (2007) found that there is higher sharing of syntax with the computer in speech based than text-based interactions. Stoyanchev & Stent (2009) has explored alignment via a newly developed product Let’s go! in a real world situation. Cowan et al. (2012) investigated how characteristics of spoken dialogue systems, such as type of voice, may infer or lead us to assume capabilities impacts on users’s alignment. They used a wizard of oz experiment design paired a confederate-scripting paradigm, in which a pair of partners take turns to play picture-matching and -describing game. Participants interacted with human partners face-to-face, with human voice via computer or with computer generated voice (advance computer voice), playing the role of matcher and describer. They found that voice type did not have significant impact on syntactic alignment, but on interaction satisfaction. Participants rated the satisfaction in human condition and advanced computer voice condition much more higher than in basic computer voice condition. In other words, alignment can be used as an indicator highlighting difficult, unnatural and unsatisfying interaction. Cowan et al. (2015) reported two controlled experiments that used picture-naming-matching task, in which they used pictures that can be described in two grammatical structures, dative structure (e.g. give the apple to the waitress vs. give the waitress the apple) and noun phrase structure.(e.g. a purple circle vs. a circle that is purple). In the game, the participant’s partner used specific grammatical structures to describe the images (prime), and then the subject would describe the image. It was to explore whether the participant would use the same structure in the subsequent descriptions. Results showed that users’ syntactic choices in speech-based dialogue were impacted by their interlocutor’s linguistic behavior, and not affected by the identity of the interlocutor. Participants aligned to the same extent when they were interacting with a human interlocutor, an anthropomorphic voice or a computer interlocutor. Additionally, because people usually have some default grammatical preference in speech dialogue, the study also examined that even very strong default preferences may be impacted by an interlocutor’s linguistic behavior.

The studies above contributed to show the spoken dialogue system behavior’s influence on users’ lexical and syntactic choices in interaction. Their results showed that in dialogue with both human and computer would lead to alignment, and there is no significant difference.

However, present studies were mostly in English context instead of Chinese. Culture may draw large impact on the perception on artificial systems. In developed countries, the prevalence and maturity of machines and computers are much more higher than that in developing countries, which indi-

that interlocutor’s characteristic and perceived communicative capability would impact linguistic behavior. Later, with the maturity of artificial intelligent and speech recognition, re- searchers directed to HCD via speech, expanding the research area from text- based dialogue. Bigot et al. (2007) found that there is higher sharing of syntax with the computer in speech based than text-based interactions. Stoyanchev & Stent (2009) has explored alignment via a newly developed product Let’s go! in a real world situation. Cowan et al. (2012) investi- gated how characteristics of spoken dialogue systems, such as type of voice, may infer or lead us to assume capabilities impacts on users’s alignment. They used a wizard of oz experiment design paired a confederate-scripting paradigm, in which a pair of partners take turns to play picture-matching and -describing game. Participants interacted with human partners face-to-face, with human voice via computer or with computer generated voice (advance computer voice), playing the role of matcher and describer. They found that voice type did not have significant impact on syntactic alignment, but on interaction satisfaction. Participants rated the satisfaction in human condition and advanced computer voice condition much more higher than in basic computer voice condition. In other words, alignment can be used as an indicator highlight- ing difficult, unnatural and unsatisfying interaction. Cowan et al. (2015) reported two controlled experiments that used picture-naming-matching task, in which they used pictures that can be described in two grammatical structures, dative structure (e.g. give the apple to the waitress vs. give the waitress the apple) and noun phrase structure.(e.g. a purple circle vs. a circle that is purple). In the game, the partici- pant’s partner used specific grammatical structures to describe the images (prime), and then the subject would describe the image. It was to explore whether the participant would use the same structure in the subsequent descriptions. Results showed that users’ syntactic choices in speech-based dialogue were impacted by their interlocutor’s linguistic behavior, and not affected by the identity of the interlocutor. Participants aligned to the same extent when they were interacting with a human interlocutor, an anthropomorphic voice or a computer interlocutor. Additionally, because people usually have some default grammatical preference in speech dialogue, the study also examined that even very strong default preferences may be impacted by an interlocutor’s linguistic behavior.

The studies above contributed to show the spoken dialogue system behavior’s influence on users’ lexical and syntactic choices in interaction. Their results showed that in dialogue with both human and computer would lead to alignment, and there is no significant difference.

However, present studies were mostly in English context instead of Chinese. Culture may draw large impact on the perception on artificial systems. In developed countries, the prevalence and maturity of machines and computers are much more higher than that in developing countries, which indicates that people who live in industrial society may trust more on computers. Research shows that trust reparation via audio-video media has better effect on US participants than on Chinese participants (Pai, 2009). In addition, alignment may also be impacted by the proficiency of language. Costa, Pickering, & Sorace (2008) reflects some alignment happened in second language dialogue. Because of the lack of linguistic knowledge, L2 tend to find making linguistic decision more effortful. On the other side, L1 also need to cope with L2 in the dialogue. The linguistic differences may impair alignment in the process of communication. This study would not only focus on the alignment in HHD and HCD, but also the difference in native language and second language.

Research Purpose and Hypothesis

The following research questions are to examine in this study:

1. For Chinese participant, to what extent would the lex- ical alignment happened in a speech-based dialogue with human and computer interlocutors?

2. Will trustworthiness of the voice impact the effect of alignment?

3. What factors would determine users’ linguistic behavior in native and second language?

Method

Material

The experiment had four modified picture-matching and -naming tasks, including two pairs of interlocutor conditions in two language environment. Each task had three questions. First question showed a picture of a common object, e.g.,a bus or a potato, and asked participants to select a name that matched the picture. Similar to Cowan et al. (2012)’s research, each object has a favored name and disfavored name. For example, Tangyuan or Yuanxiao describe the same kind of food but are two terms. Then they listened to a piece of audio about this certain object in either human voice or computer generated voice and then summarize the content via typing. The name of the object in the audio was different from the participant’s answer. The audio was to simulate dialogue situation In the third question, participants should rate the trustworthiness of the voice, using a 5-point Likert scale from Strongly Agree (1) to Strongly Disagree (5). The four conditions were presented in Table 1.

Table 1 Four conditions

Three questions in each task are included here for clarity: (details can be found in the questionnaire).

  1. What is this ?
  2. Listen to a paragraph, and summarize the content.
  3. How do you trust the voice?

The last part of the survey were three demographic questions about birthplace, years of learning English and English proficiency via a 5-point Likert scale. The survey was created on Google Form and sent by link.

Human voice was recored by native speakers from China and the US. Computer voice was generated by Xun Fei (Chinese) (XunFei Open Flatform, 2017) and FromTextToSpeech (English) (From Text to Speech, 2017).

Participants

Due to the limitation of time and resources, ten participants took part in the experiment. All were college students from a university in Beijing whose first language is Chinese.

Data collection

The survey was sent by link and completed online. The responses were collected and analyzed on Google sheet and SPSS on Mac.

Results

The percentage of aligned and unaligned situations for each condition are displayed in Table 2. Some answers were classed as Other or invalid response across the experiment, which was those that did not mention either favored name or disfavored name. The alignment effect is the ratio of participants who gave aligned answer to participants who gave valid answer. Only those responses categorized as Aligned or Not Aligned were included in the analysis. As can be seen in Table 2, there were no significant difference between alignment in human or computer condition. Alignment effect of Computer voice in English in a bit lower than others.

Table 2 Alignment effect
Table 3 Trustworthiness

From Table 3, we can see that there is no significant relationship between participants’ perception of trustworthiness and the alignment effect. Both of the computer voice in Chinese and English has low trustworthiness. Participants consider computer voice in Chinese as the least trustable one but its alignment effect is the most prominent.

At the end of the experiment, participants were asked to evaluate their English fluency. Most participants consider their English proficient. Table 4 show the the result of two conditions in English. The Alignment effect is presented in Table 5. As can be seen in Table 5, both students proficient or unskillful in English has significant effect of alignment.

Table 4 English level and alignment in English context

Discussion

The results of the research suggest that there was no significant effect of voice types on lexical alignment in either human-human dialogue and human-computer dialogue both in Chinese and English. This result corresponds to that of the previous research in English environment. According to Garrod & Anderson (1987)’s research, alignment effect may be due to automatic priming mechanisms, which do not specifically related to interlocutors in determining speakers’ language behavior. People tend to repeat interlocutors’ language choices partly due to the processing of these lexical items automatically facilitating their subsequent re-use (Cowan et al., 2015). In this account, alignment effect may be impacted by the frequency of the certain item in the dialogue. For ex- ample, in the second task (CC), the term “potato” showed for 10 times, but in other tasks the item showed only four times. The alignment effect of CC condition is the most prominent.

Our prediction is that participants would lexically align less in lower trustworthy conditions. However, result shows that trustworthiness may not be an effective metric of possibility of alignment. Even the trustworthiness of computer voice was low, there is still evident alignment effect in those two conditions. In Cowan et al. (2012)’s research, researchers used Interaction Satisfaction Questionnaire (ISQ) to measure participants’ satisfaction when interacting with their partners in the game. They found that there was a significant main effect of Partner on the scores in the ISQ but no evident effect of Partner on alignment. The purpose of looking for the rela- tionship between satisfaction and alignment effect is to find an approach to use alignment as a metric of the system. As Cowan et al. suggested, even though there are no significant findings, we still encourage other researchers to find other factors.

In addition, there is no distinct relationship between English fluency and alignment effect. However, significant alignment in second language computer voice condition may reveal the possibility of developing intelligent dialogue system as second language learning tool. In HHD, native speakers tend to reduce speaking speed and align to L2 learners in order to keep their mental state similar, which will not facilitate L2 speakers to learn new words. Different from human speakers, artificial dialogue interaction system could not be impacted by L2 speakers and can provide personalized practicing content by generating output that the learner is not familiar with. The learning effect of the learner could be evaluated by alignment effect. Sometimes the lack of alignment effect may due to the preference of utterance. For example, the picture in the third task (CE condition) is a bus, or coach. Learners may be more familiar with the name “bus” and prefer to use it. The artificial dialogue system could keep uttering “coach” until the learner align.

This experiment still has limitations. First, the number of participants is not enough. Ten college students may share the same linguistic style and cannot present most English learners. Further study could recruit more participants with diverse demographic background. Moreover, task requirement should be more precise. The second question asked participants to summarize the paragraph they just heard. But some of them completed the task without mentioning the specific item. Because in some tasks, the item is not the theme or protagonist of the story. It brought in the generation the invalid answer. This kind of answer is categorized as “Other” and considered as invalid in data analysis. The survey was ought to set minimum number of words or required participants to complete under researcher’s supervision.

Conclusion

This experiment evaluated lexical alignment effect in speech-based dialogue with human and computer interlocutors in both Chinese and English environment. Alignment effect is similarly significant in the four conditions and would not be impacted by participants’ perception of trustworthiness and English proficiency. The result verified previous research in English context. Since alignment in HE and CE conditions have little difference, the value of artificial dialogue system as language learning tool could be developed, utilizing alignment effect as teaching approach and a metric of learning effect.

References

Bigot, L. L., Terrier, P., Amiel, V., Poulain, G., Jamet, E., & Rouet, J.-F. (2007). Effect of modality on collaboration with a dia- logue system. International Journal of Human-Computer Studies, 65(12), 983–991.

Branigan, H. P., Pickering, M. J., Pearson, J., McLean, J. F., & Brown, A. (2011). The role of beliefs in lexical alignment: Evidence from dialogs with humans and computers. Cognition, 121(1), 41–57. Retrieved from http://www.sciencedirect .com/science/article/pii/S0010027711001363 doi: 10 .1016/j.cognition.2011.05.011

Branigan, H. P., Pickering, M. J., Pearson, J., McLean, J. F., & Nass, C. (2003). Syntactic alignment between computers and people: The role of belief about mental states. In Proceedings of the twenty-fifth annual conference of the cognitive science society (p. 186–191).

Brennan, S. E., & Clark, H. H. (1996, Nov). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychol- ogy: Learning, Memory, and Cognition, 22(6), 1482–1493. Re- trieved from http://gateway.proquest.com/openurl?ctx _ver=Z39.88–2003&xri:pqil:res_ver=0.2&res_id=xri: ilcs-us&rft_id=xri:ilcs:rec:abell:R03000086 doi: 10.1037/0278–7393.22.6.1482

Costa, A., Pickering, M. J., & Sorace, A. (2008). Alignment in second language dialogue. Language and cognitive processes, 23(4), 528–556.

Cowan, B. R., Branigan, H. P., & Beale, R. (2012). Investigat- ing the impact of interlocutor voice on syntactic alignment in human-computer dialogue. In Proceedings of the 26th annual bcs interaction specialist group conference on people and computers (p. 39–48). British Computer Society.

Cowan, B. R., Branigan, H. P., Obregón, M., Bugis, E., & Beale, R. (2015, Nov). Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in humanâ ́LŠcomputer dialogue. International Journal of Human — Computer Studies, 83, 27–42. Retrieved from http://www.sciencedirect.com/ science/article/pii/S1071581915001020 doi: 10.1016/ j.ijhcs.2015.05.008

From text to speech. (2017, June). Retrieved 13 June 2017, from http://www.fromtexttospeech.com/

Garrod, S., & Anderson, A. (1987). Saying what you mean in dialogue: A study in conceptual and semantic coordination. Cog- nition: International Journal of Cognitive Science, 27(2), 181. Re- trieved from http://gateway.proquest.com/openurl?ctx _ver=Z39.88–2003&xri:pqil:res_ver=0.2&res_id=xri: ilcs-us&rft_id=xri:ilcs:rec:abell:R02656327

Pai, S. D. (2009). Effects of cultural differences and computer media on trust reparation (Doctoral dissertation). Retrieved from http://search.proquest.com/docview/304879380

Stoyanchev, S., & Stent, A. (2009). Lexical and syntactic priming and their impact in deployed spoken dialog systems. In Proceed- ings of human language technologies: The 2009 annual confer- ence of the north american chapter of the association for computa- tional linguistics, companion volume: Short papers (p. 189–192). Association for Computational Linguistics.

Xunfei open flatform. (2017, June). Retrieved from http:// www.xfyun.cn/