Modelling User’s Mental Model through Voice User Interface

23 min readJan 19, 2018

1. Introduction

Back in 1960s, although a computer was still a gigantic and cumbersome object that required well-trained scientists to interact with it by punching holes, Stanley Kubrick and Arthur C. Clarke (1968) already created and designed one of the first voice user interfaces(VUI), HAL 9000, in their science fiction movie “2001: A Space Odyssey”. Since then, the science fiction never stops imagining and creating a conversational Artificial Intelligence, and it seems that our mental model already firmly holds the idea that an intelligent machine should be able to speak.

Recent years, with the impressive progress in computer science, especially in voice recognition, natural language processing (NLP), and text to speak (TTS), voice user interface becomes a promising field and has already been applied into a wide range of products, such as Google Assistant, Amazon Alexa, Apple Siri etc. On the other hand, the technology constraints confine machines’ intelligent level. Thus, how can we create a VUI system to make the machine smarter? How could a VUI system facilitate human-computer interaction and enhance users’ engagement? The answers may not only exist in technical aspects, but also largely depend on design.

In this article, I will mainly focus on the design aspects of Voice User Interface, especially explore how users’ mental model could affect VUI design, and conversely how designer could model and modify users’ mental model to make the VUI easy-to-learn and easy-to- use. In the first part, I will first introduce the concept and history of mental model, and then discuss how mental model relates to Voice User Interface design, and why it’s important to VUI. Ultimately, I will propose a few design principles (Design VUI’s persona for different user case, cope with error and design for both novice and expert user), of which designers should be aware to design a VUI system that aligns with users’ mental model.

2. Conceptual Definition of Mental Models

The concept of a mental model can be traced back to 1943, when Kenneth Craik (1943) first discussed a relationship of our internal representation and external world. Craik argues that the“internal model of reality — this working model — -enable us to predict events which have not yet occurred in the physical world” ( p.82). He suggests that the process firstly translates the external information into words, numbers, or other symbols, and then through the process of reasoning, deduction and inference, the symbols are subsequently retranslated as a bridge to help people understand the external world ( p.82). Even though he didn’t precisely propose the word “mental model”, but the term “Internal Model” he used still considered to be a very first terminology of mental model.

After Craik’s work, many researchers began to realize the implications of mental models, but these researchers represent the term “mental model” in different ways, and some even use different terms to refer the similar but evolving inner meanings of mental model. For example, Richard M. Young (1983) argues that people’s ability to use a certain device partly comes from their mental models, but user’s conceptual model is still unclear and ambiguous (p.35). While Donald A. Norman (1983) also mentioned the conceptual model, but he referred the conceptual model as “an appropriate representation” of a system invented by designers, teachers, scientists and engineers (p.7), in other words, Norman thinks the conceptual model is professionals’ precise and correct representation of a system. He also pointed out the constantly evolving feature of our mental model and how our mental model of different systems can affect each other as well (p.7). Johan de Kleer and John Seely Brown (1982) also discussed the ambiguities of users’ mental models, but they attempted to investigate the framework of people’s mental models’ structure, which they called it mechanistic mental models (p.155).

Although there are many different definitions of a mental model, the widely accepted notion of a mental model is “internal representations of system in a particular knowledge domain. These internal representations are formed through knowledge (instruction) or experience or a combination of the two”. (Staggers, 1993, p. 601) I will also use this definition as a foundation to discuss the relationship between user’s mental models and VUI design.

Mental model is not only a terminology in cognitive psychology, but could also be applied in design. Staggers (1993) has observed that “users with well-developed mental models are likely to commit fewer errors” (p.595). However, the answer of how to define a developed mental model remains hazy, while Donald A. Norman’s (1983) research might be able to answer, and he pointed out the gaps between a user’s mental model of a target system named M(t) and an accurate and complete conceptual model of the target system, which he called C(t) (p.11). Therefore, if the gap between our mental model M(t) and conceptual model C(t) is trifles, we can say a user’s mental model is well-developed, and in this case, the user is likely to commit few errors when using the system. Norman emphasis that “we must distinguish between our conceptualization of a mental model, C(t), and the actual mental model that we think a particular person might have, M(t).” (p.11). The distinguish is significant for design, because it represents the gaps between original outcomes that designers mean to achieve and the reality that a user assume what the design would be. If the gap between a designer’s conceptual model and user’s mental model is tremendous, it is reasonable to assume that users will feel a great dissonance throughout their user experience. Therefore, designer is urged to understand users’ existing mental models, so that the design they come up with could congruent with users’ existing mental model, minimizing learning cost, and diminishing these gaps for users in the first place.

3. Why Mental Model is important to Voice User Interface (VUI)

Before diving into the discussion of a user’s mental model of VUI, I would like to compare it with the context of the formation of users’ Graphic User Interface’s mental models. User’s mental model of how to use a graphic user interface (GUI) was established especially after the wide usage of screen devices. In 1980s-2000s, UI designers put a great effort to implement skeuomorphism in their UI design to make full use of users’ existing mental model of physical world, and this approach greatly reconciled the abovementioned gaps between designers’ conceptual model and user’s mental model. Gradually, users constructed a completely new mental model of screen-based interaction and became able to effectively use finger gestures like tap, slide, scroll to interface with screens-based systems, even though most of these interactions are totally acquired later and considered to be unnatural.

Unlike screen-based GUI interaction, we initially have a well-established mental model of how to speak along with decades of training and practicing, and we are extremely good at conversational communication. Ideally, users don’t even need to be taught or learn how to use a VUI system. In this sense, it seems that a VUI system would be much easier-to-use. But anyone who has used a IVR (Interactive Voice Response) system or early VUI system would have much more complaints than they expect. Our pre-existing mental model of speech, does not make the design of VUI easier, on the contrary, it makes the design of VUI extremely difficult.

Dissimilar with the mental model of screen-based interaction that we acquired later, we instinctively project our existing mental model of speech into a VUI system. In other words, we expect to interact with a VUI system in the same way that we interact with other people. Any differences between communicating with a VUIs and human-to-human conversation would be greatly amplified and significantly violate user’s experience. Furthermore, years of conversational experience accumulates us a strong and concrete mental model of speech, which makes us extremely sensitive to details in a speech. For example, we can instantly distinguish a tiny wrong pronunciation or a little inappropriate wording in a sentence. Moreover, speech usually follows certain unstated social norms. For instance, if someone ask “How are you doing?”, they didn’t expect any honest reply but an answer like “I am doing well.” In addition, one must realize that the same sentence with different tones and different emphasis usually conveys completely different meaning. A classic case comes with the sentence “I never said she stole my money”. By emphasizing the tone on different words in the sentence, a person can convey multiple meanings. All these abovementioned language norms are already sturdily built into our mental model, and by no means should a VUI system violate these pre-established mental models. Our pre-existing social and language norms, sensitivity to speech details and tones all significantly affect our perception of a VUI. Thus, when we interact with a VUI system, these unconscious norms will continuously shape our expectation and force us to measure the VUI system constantly, make us to either consider the system to be smart or stupid.

Furthermore, designers must realize that our pre-existing mental model of speech doesn’t encourage users to learn and fit into the VUI system, instead, it, for some extent, impedes users from further exploration. For instance, an early IVR system usually asks questions like “To confirm the order, say ‘confirm’; to cancel the order, say ‘cancel’; to go back the main menu, say ‘back to main menu’”. When user repeat the answer in the way that the IVR system teaches us, user feels that they are acting stupid. When being taught of how to speak, users experience an apparent disconnection with their previous experience, rendering the conversation dry and tedious. In this case, the previous mental model of speech doesn’t serve as a catalyst to learn, rather, it rises an emotional resistance that protests users of further fitting into the system.

On the other side, we must admit that our technology has certain constraints, a VUI system, therefore, would not interact with a user in the same way as human interact with each other. The VUI system contains many types of technologies, including natural language processing and understanding (NLP & NLU), automated speech recognition (ASR) or text to speech (TTS), and these technologies would never be perfect. As Cathy Pearl (2016) mentioned in her book, although the current ASR technology “has greater than 90 percent accuracy, keep in mind this is under ideal condition. Ideal condition typically means an adult male in a quiet room with a good microphone.” (p.122) And be mindful, besides what Pearl mentions in the book, ideal condition in United States also typically needs the speaker to speak standard American English without using too many professional and seldom used terminologies. These constraints require a VUI to effectively clarify to users its capability as well as what is beyond, thus to construct an understandable and accurate mental model for its users. With this mental model, users would know what task the machine is able to handle, and in what way should they ask the question without keep giving silly voice commands.

The consistency of users’ mental models plays an important role in VUI design, and a consistent and well-developed mental model should encourage users’ further exploration and guide them to predict some of their unused and hidden functions. After using a VUI system for a while, a user is likely to form specific mental model of the VUI, which enable him/her to forecast the events that haven’t occur yet (Craik, 1943, p.82). But such a prediction in VUI is easily fell into over-generalization or over-mapping. Nancy Stageers (1993) suggests that “Individuals can over-generalize or over-map if they import existing models or use multiple models (p.603). If the pre-established mental model failed to guide their further exploration, a great frustration would arise. For example, Anna Abovyan, who is the UX manager in M*Model, when she and her team built their VUI system, they allow users to select last three words with voice command. But they observed that many users started to create their own voice command, such as “underline previous five characters” or “delete next two sentences”. As a result, Anna and her team had to add these related functions into their system to ensure the consistency of their users’ mental model. Guaranteeing consistency of users’ mental model indeed poses a great challenge to VUI designers, but designers must realize that these over-generalized or over-mapped behaviors are not occasional, and the dynamic and evolving nature of a mental model is doomed to be misused by users, but how to avoid our VUI system to mislead users at utmost is of pivotal importance for VUI designers

VUI’s impact on our mental models may be trivial, but cannot be ignored. Another interesting story told by Anna Abovyan drew my attention to the potential negative impact of VUI. One day, a father came to Anna and complained that Amazon Echo has taught his daughter impolitely. Because his daughter always listens his parents giving commands to Alexa without saying “Thanks” or “please”. This behavior quickly was built into the little girl’s mental model, and she started to give impolite commands to her parents and friends in the same way. This singular example shows the other sides of our mental models’ over-generalize feature, the little girl’s mental model of speaking to a VUI system apparently over-generalize into her mental model of speaking to human beings. As the abovementioned example happened in M*Model’s product, multiple mental models can affect to each other in unexpected ways.

We expect to shorten the distance of a user’s mental model of VUI and his mental model of speech, and we want user to interact with a VUI system just like the way they interact with people. But the technical constrains impede us from doing so. How could designers deal with these limitations and push forward the VUI’s user experience? If we go back to Norman’s concept of a conceptual model and mental model, we would find that an ideal mental model of a VUI should be as similar to human’s speech mental model, and Staggers (1993) also suggests that “Designers’ conceptual models should be congruent with users’ mental models.” (p.602) As such, when we design a conceptual mental model, we need to minimize the gaps between designers’ conceptual model, user’s mental model of VUI, and their mental model of speech. It also requires us to better understand how people perceive a VUI, as well as how to deal with current technical limitations.

4. How to build an effective VUI congruent user’s mental model

We want our VUI system smarter, and we would like users to interact with it more causally, just like having a human-to-human conversation. Thus, designer should not only regard the system as a machine, but also treat it as a living character with personality, emotions, and empathy. Furthermore, designers are also urged to consider current technical limitations, and deal with all kinds of complicated situations, instead of merely focusing on when things work. In addition, it is also worthwhile to design for both novice and expert users at the same time, so that the VUI can forge deeper connections with users over time.

In this part, I will propose three aspects of VUI design to help designer to come up with an effective VUI that congruent users’ existing mental model, making the VUI easy-to-learn and easy-to-use, and the three aspects are “design VUI’s persona for different user cases”, “cope with errors build and modify users’ mental models”, and “Design for both Novice and Expert Users”.

4.1 Design VUI’s persona for different user cases

Human naturally has an ability to anthropomorphize. This ability allows us to evaluate speech in terms of a speaker’s personality. Cohen, Giangola and Balogh (2004) claimed in their book Voice User Interface Design that “There is no such thing as a voice user interface with no personality”. In fact, back in 1972, sociolinguists William Labov (1972) already documented the relationship between speakers’ voice and their personality, and he argues that the evaluation of language “are readily and consistently expressed in terms of personality judgements about different speakers.” (p.310). The voice and personality connection is so robust that even negligible speech can draw a clear impression on speaker’s personality. This association does not solely happen in human-to-human interaction, we also unconsciously project such judgements onto VUI systems. Like Google’s VUI designer James Giangola (2017) said “Every voice has an owner, and we naturally form a mental image or composite of the speaker.” This “mental image” is somehow equivalent to users’ mental models which governs users’ emotional responses to the VUI system.

Since a user will forge a mental image of a VUI system anyway, thus it makes no sense to leave it unplanned. But if UX researchers ask a user what kind of personality do they expect in a VUI system, they normally get very general answers like friendly, trustful, kind, willing to help, or sometimes funny. These personalities only form a very basic and abstract imagery of a VUI’s personality, and it doesn’t contribution much for designers to dive deep and construct a more vivid VUI persona, nonetheless how to apply subtle personality according to different user cases can also make big differences for user experience.

In psychology, personality is described through a five-factor model (McRae, John. 1992. p.175–215) including Extraversion, agreeableness, conscientiousness, neuroticism and openness. Schmitz, Kruger and Schmidt (2007. p.315–316) used the five-factor model to create the personality of different voices that talk about a series of products. And their research showed that a customer’ perception of a product can be significantly affected by the personalities of the voice that introduces the product. The impact of the personalities of a voice can also be powerful in a VUI system, especially users form their imagery of a VUI only based on what they hear, rather than what they see.

Defining a VUI’s persona can be very abstract and challenging, if designer would like to identify a VUI’s persona by interviewing or conducting user research with potential users, proposition is typically more feasible and effective than inquiry. Due to the abstract nature of personality, it is very hard for a user to articulate what kind of personalities he/she expects, and merely using words like trustful, helpful or kind doesn’t render a clear persona. Under such a circumstance, researchers or designers can propose some possibilities to users rather than asking. For example, if designer can come up with different well-known characters(celebrities), such celebrities as Scarlett Johansson, Taylor Swift, or Anne Hathaway, and then ask user what if your VUI’s personality is like Taylor Swift, or Anne Hathaway; or ask questions like “Would you like your smart home assistant’s personality more like Tylor Swift, or Scarlett Johansson?” (Of course, designers can give users more options, but here is just an example to help readers understand the concept) In this case, users might be able to give more nuance responses regarding a VUI’s personality. Moreover, designers can even pre-record some lines, or voice sample and provide to users to get feedback. Subsequently, designers can analyze the feedbacks they got and put them into the abovementioned five-factor personality model, so that it’s easy to see what kind of personality users are looking for.

Furthermore, designers can also treat a VUI as a character in a movie or comic book. “In the world of voice user interface, the term “persona” is used as a rough equivalent of ‘character’, as in a character in a book or film” (Cohen, Giangola & Balogh. 2004. p.77). A VUI’s persona is similar to a comic book character, and by putting the character into context, such as user cases or scenarios, and letting the character have conversations, actions and even relationships, designers are able to render its persona more vivid.

Moreover, designers can imagine the VUI character as certain role, like a doctor, secretary, or butler. As soon as the VUI persona is put into such a specific context, its personality, voice or even image would automatically appear.

Additionally, whether the voice of a VUI can fit into its product and brand’s personality also significantly influences users’ perceptions. A market research (North, MacKenzie, Hargreaves & Law. 2004. P. 1675–1708) conducted in 2004 shows that if the ads’ music and voice can fit the advertised brands, it will increase knowledge-based and affective responses to advertising. In Schmitz, Kruger and Schmidt’s (2007. p.315–316) research paper, they also leveraged the Five Brand Personality Dimension (Aaker. 1997. p.342–352) into their design of voice personality. It is worthwhile for designers to identify their brand personality dimension before designing a VUI’s persona, especially when designers are designing for a well-known brand or company, such as Google or Walmart, because users might already form a concrete mental model of what the brand is like. Under this circumstance, it is dangerous for designers to come up with a VUI persona that doesn’t align with their existing brand personality. For instance, if a Mercedes-Benz’s automobile voice assistant sounds very childish or funny, users would feel a great dissonance with their existing mental model of the brand personality, but this childish and funny personality would probably work well in a JIBO home robot. All in all, the consistency of VUI’s personality and brand personality would greatly enhance user experience, amplify users’ memory of the band, and encourage their engagement in the VUI system.

Our mental model doesn’t only deal with how a device or system works, but it also keenly evaluates the coordination of different parts in the system. For example, if a deep and sophisticate voice speaks very childish lines, our mental model would distinguish as dissonance instantly. What’s worse, as soon as the discordance is once identified, it tends to draw users’ attention constantly, and makes it unignorably. Therefore, designing a persona must be carefully evaluated the coordination in various possible user cases, ensuring it not only align with brand personality, users’ existing mental model, but also make it perform consistently and vividly for different user cases.

4.2 Cope with errors, build and modify users’ mental models

“you can’t just design for when things work — -you need to design for when things go wrong as well. This is especially true with VUIs, because something will always go wrong” (Pearl. 2016. p.41) It is very true, and due to technology constraints, error is inevitable. Error can take place both in the Automatic Speech Recognition (ASR) process or execution process in which user’s request is beyond the system’s ability. But encountering errors does not necessarily mean a user must be frustrated, while by contrast, it is a critical moment to modify user’s mental model and make user better understand the system’s limitations as well as its potentials.

In Cathy Pearl’s (2016) book “Designing Voice User Interface”, she pointed out four different scenarios of VUI’s mistakes that can happen frequently: No speech detected, Speech detected but nothing recognized, recognized but not handled, and recognized but incorrectly.

When a user’s response is expected, but the response is not detected, an ASR system will instantly respond with a no speech detected (NSP). Every VUI system should have a NSP timeout, which indicates that if the NSP is beyond a certain amount of time (the timeout length is usually between 5–10 seconds), then the system should execute some actions rather than keep waiting for a command. Likewise, when the ASR does detect a speech but is unable to recognize it, it usually falls into the same category. In both cases, it usually happens when user uses “hot word” to wake up the system but doesn’t ask anything or doesn’t ask clearly, or it is also possible that the system is waiting for answers but doesn’t receive any. At this point, letting user know explicit what status the system is experiencing is important. Like human-to-human interaction, if a person doesn’t pick up the other’s voice, he probably will present a confusing expression or ask for pardon. Pearl (2016) suggests two strategies that a VUI system could implement in this situation: the system could simply call out explicitly like “Sorry, I didn’t hear you. What is your destination?” or keep it silence (p.43). For a VUI system, it is also a crucial moment to modify a user’s mental model of what kind of environments and voice volume is good for the ASR system to pick up.

In abovementioned circumstance, it is highly possible that a user doesn’t say anything at all. So, keeping ask questions like “Sorry, I didn’t hear you. Please say again” will drive the conversation inhuman and annoying, especially for cases that the answer is not important and doesn’t affect the conversation to move on. Nevertheless, for cases when user is in the middle of a transection, keeping confirming with users for three times but uses different wording, and sometimes even use imperative wording, like “Say your flight number out loud now.” If a system is able to handle various tasks with different strategies, and follow a human-like interaction rules, users consider the system to be trustful and reliable, and further connect their human-to-human speech mental model with their speech mental model of the VUI system.

Furthermore, when no speech detected happens, it is reasonable to assume that the user doesn’t know what to ask. As such, give them examples of how to ask would be of great help to further develop user’s mental model. For example, when experiencing a NSP timeout, both Apple Siri and Google assistant will display a couple examples of what a user can ask, such as “Wake me up in 8 hours” or “Call Peter”. Moreover, if the system can remember user’s past voice command, the system should provide new options at this point, inciting users to explore more hidden functionalities and encourage them to develop a more holistic mental model of the VUI system.

When user gives a voice command that is clearly recognized but the task is not able to handle, besides saying sorry and telling user that the command is out of scope, the system should also recommend users some other related options that the user might be interested in, which will efficiently push the conversation forward. If a system can put the conversation forward when something goes wrong, it not only makes users feel the system is smart, but it also creates a crucial opportunity for users to explore different features in the VUI system.

In addition, constantly iterating based on error data is a very useful strategy for improving a VUI design. If possible, the system should record the data when user’s voice command is beyond the system’s capability, and figure out what happened and why it happened. Designers should especially pay more attention to the errors that appear repeatedly in certain steps or process, which could possibly be the parts that requires improvement. Google VUI design lead Abi Jones contents that “when you talk to a human being, there is never an unrecoverable error state.” Neither should a VUI. A designer should always plan backups for error handling. Errors usually appear when user’s current mental model collapses with what Norman called designers’ conceptual model, but by effective error response, a VUI system can help users to bridge the gap of these two models. Nevertheless, error handling shouldn’t be regarded as a plan B, once a designer observes how user interact with a VUI system, the designer would understand how arbitrary and unpredictable users’ questions would be, error handling therefore should be treated as priority as a normal function in VUI design.

4.3 Design for both Novice and Expert Users

It is undoubtable that the mental model of a novice user and expert user would be dramatically different, and study already shows the variances between these two types of users’ problem-solving ability (Staggers. 1993. p.599). Treating novice and expert users in the same way will either make novice users feel paralyzed, or make expert users tedious. Thus, as long as a user has got used to a system, it is unnecessary to force them listen to lengthy and detailed instructions every time.

Pearl (2016) gives an example of healthcare app’s VUI conversation:

Novice user: AVATAR:

Let’s take your blood pressure. Please make sure the cuff is turned on. Wrap the cuff Around your arm so that the blue arrow is pointing toward your palm. Be sure to sit down and have your feet flat on the floor. When you’re ready, press continue. User has interacted with the app every day for a week:

AVATAR:
Time to take your blood pressure. Please put the cuff on and press continue. (p.49).

If we imagine we are users in the example who use the VUI system every day, it is reasonable that after a while, the second concise prompt replaces the first one.

Staggers & Norcio (1993) also mentioned an interesting distinction about novice and expert users, they documented in their experiment that “as one expert said during an interaction, ‘It won’t work, but let’s try it anyway’. Novices, on the other hand, had an impoverished repertoire of strategies, and tried few trial-and-error attempts as remedies.” The observation might somehow contradict to our common expectations, that experts are more willing to give a try than novice users, which means a developed mental model of a particular system also encourage a expert user to explore. This observation also indicates that a VUI design should implement more prompts to encourage and introduce the hidden functionalities for novice users, especially for those who just get started and have no idea how to use a system. This is especially challenging for the VUI systems without screens, such as Google Home or Amazon Echo, because unlike GUI, “For voice interface, this visual discovery of features is nonexistent” (Pearl, 2016, p.65).

But how to distinguish a novice user and an expert user. Theoretically, the VUI system can record the times a user interacts with the system. But as Pearl’s warning “Don’t just count number of times the app has been used”, because a user might use it once a month or less frequently, so even if the user has used the system a couple times, but he/she might already forget how to use the system after a long time. In this case, keep the same prompt of a novice user would be more reasonable.

Once a system can identify a user as an expert, it should gradually adapt to their behavior and learn from their past preference. For example, if a user asks Amazon Echo to play a music, Alexa (the voice assistant in Echo) would say “Here is a station you might like. XXX” and then start to play the music. Alexa looks back my whole music play history, base on which she makes a recommendation for me. These adaptive behaviors can take place in any scenario, and over time, Alexa could even shorten her prompt, without claiming the first part “Here is a station you might like”, because users’ mental models have been built up, and they know the music is chosen base on their preferences and past records. The adaptive behavior is critical to user experience, because this experience is similar to human-to-human interaction, which is just like a jargon that you and your older friend share over time.

5. Conclusion:

A mental model is a user’s pre-existing inner representation of a given system, but designer should be aware of that a user’s mental model is not static, rather, it ceaselessly evolves and changes through a user’s usage, so every moment of interaction within the system should be a critical opportunity for designers to modify users’ existing mental model of the system. If we use Norman’s (1983) theory that a conceptual model is professionals’ accurate representation a target system namely C(t), and mental model is a user’s understanding of the system, which is called M(t) (p.11), a designer’s task is to eliminate the gap between these two models.

Additionally, user’s mental models of multiple systems can mix up together, which is exactly the case in VUI, because a user’s mental model of having human-to-human conversations would significantly affect their expectation to speak to a VUI system. Thus, VUI designers are urged to understand how people speak in order to come up with a VUI that aligns with users’ language norms and speech habits.

Ultimately, designers should also pay attention to design persona to propose a proper personality for product brand and user cases; and it is also a valuable time when errors appear, because it is an important timing to modify users mental model. Handling errors well could not only lessen users’ frustration, but also make the system smart and trustful.

Separating expert and novice users and treating them differently would greatly make the system intelligent, which will further forge a deeper and personal connections with users over time. Designing VUI somehow is modelling and modifying users’ mental model, and a good VUI design can easily assist user to build up an effective mental model and make the system easy-to-learn and easy-to-use.

References:

Abi. Jones. (2017) Voice interfaces are here to stay.

https://becominghuman.ai/voice-interfaces-are-here-to-stay-f2d3d206a6c4

Adrian C. North, Liam C. MacKenzie, and David J. Hargreaves & Ruth M. Law. (2004). The effects of musical and voice “‘fit”’ on responses to advertisements. Journal of Applied Social Psychology, 34(8):1675–1708. Hoboken, NJ: Wiley-Blackwell

Craik, K. J. W. (1943). The nature of explanation. Oxford, England: University Press, Macmillan.

Cathy Pearl. (2016). Designing Voice User Interfaces: Principles of Conversational Experiences. Highway North, Sebastopol, CA, US: O’Reilly Media, Inc.

Donald A. Norman (1983). Some Observations on Mental Models. Hillsdale, New Jersey, London: Lawrence Erlbaum Associates, Publishers.

James Giangola. Design Principles and Methodology: The perception of a personality https://developers.google.com/actions/design/principles

J. Aaker. (1997). Dimensions of brand personality. Journal of Marketing Research, p. 342– 352 American Marketing Association
Johan de Kleer and John Seely Brown (1982). Assumptions and ambiguities in mechanistic mental models. Hillsdale, New Jersey, London: Lawrence Erlbaum Associates, Publishers.

Michael H. Cohen, Michael Harris Cohen, James P. Giangola & Jennifer Balogh. (2004). Voice User Interface Design. Boston, US : Addison-Wesley Professional

Michael Schmitz, Antonio Krüger & Sarah Schmidt. (2007). Modelling personality in voices of talking products through prosodic parameters.
Honolulu, Hawaii, USA: Proceedings of the 12th international conference on Intelligent user interfaces

Nancy Staggers & A. F. Norcio (1993). Mental models: concepts for human-computer interaction research. Washington DC: International journal of man-machine studies
R.R. McRae and O.P. John.(1992) An introduction to the five- factor model and its applications. Journal of Personality 60 Hoboken, NJ: Wiley-Blackwell

Richard M. Young (1983) Surrogates and Mappings: Two Kinds of Conceptual Models for Interactive Devices. Hillsdale, New Jersey, London: Lawrence Erlbaum Associates, Publishers.

William Labov. (1972) Sociolinguistic Patterns. Pennsylvania, US: University of Pennsylvania Press