Personalization (Part 2 of 3): A Rubric for Generative AI Design & User Safety

11 min readAug 16, 2023

Rendered from Stable Diffusion using the prompt “Generative AI Design & User Safety Rubric”

In Part 1 of this 3-part series on personalization, I outlined a variety of the AI persona types currently being deployed by different players offering LLM-based chatbot services to the public. I set forth five basic AI persona types having characteristics that lie on a spectrum between AI as Robotic Assistant and AI as Human Surrogate: (1) AI as Assistant/Instructor, (2) AI as Butler/Mentor, (3) AI as Friend, (4) AI as Romantic Partner, and (5) AI as Personal Surrogate. Understanding these persona types and their differences helps illuminate distinct design possibilities and safety risks posed by different relational modes between chatbots and human users. In the preprint article on which this series is based, I explain in greater depth why different personalization goals for an LLM-based chatbot pose different levels of user safety risk. In this, Part 2 of the series, I provide a basic schema for thinking about such design possibilities.

To ground an exploration of the stakes of personalization design, I first look at instances in which personalized chatbot interactions went badly. These are critical for informing why personalization dynamics in chatbot interactions matter and why a more systematic understanding of such dynamics is important for chatbot communication proficiency and safety design.

Popular Generative AI chatbots such as ChatGPT, Bard, and Bing Chat have collectively hundreds of millions of daily users. The operators of these services devote substantial time and resources to training, aligning, and improving the performance of their chatbots. All of these operators, however, feature prominent disclaimers regarding the limits of these chatbots and describe things they may spontaneously get wrong. In a prior article focusing on the convincingness versus accuracy of Generative AI outputs, I presented a structured way for thinking about the success and failure modes of output content. In that rubric, the extent to which an output is convincing can be a function of its framing, i.e., the couching language used to present the core content responsive to a prompt. Here, the extent to which an output is personalized can be an aspect of whether or not its framing is convincing. Engagement proficiency, a personalized response consistent with what a user might want or expect, is clearly the goal for Generative AI applications designed to serve as companions or friends, such as those offered by Character.ai and Replika. However, creating and improving engagement proficiency is also a design goal for many other Generative AIs. As such, having an AI generate outputs that are appropriately personalized to a given user for a given prompt cycle is one of the key goals for initial training, fine-tuning, and reinforcement learning for chatbots (see also here, here and here).

Depending on the type of Generative AI chatbot, that goal may be directed to well-gauged communications, either with a generic user or with particular users having particular preferences, personalities, and communication goals. The current versions of ChatGPT, Bard, and Claude 2 are examples of the former. They are general purpose Generative AI chatbots whose outputs tend to have a friendly but neutral, impartial response tone. Replika is an example of the latter, a chatbot specifically designed to become each user’s companion over time. It expresses familiarity and develops both response content and a response tone that are increasingly specific to each user over multiple, successive sessions. Character.ai is somewhat in between, allowing users to create personalized chatbots based on known real world or literary characters or simply on a persona that is custom-specified by the user. Whereas Replika limits interaction with a particular user’s chatbot to only the user themself, Character.ai permits a user to make their customized chatbot usable by others.

Whether or not a “convincing,” personalized output is perceived favorably depends on the user’s expectations. Some users want to have a more personalized experience with their chatbot of choice, and some chatbots are designed to create a sense of personal affinity with their users. However, some of the most unnerving accounts of Generative AIs involve chatbot interactions that were not expected to range deeply into personalized territory, whereby a user is confronted with chatbot outputs reflecting hostility, threats, or undue expressions of affection or desire. Already noted in Part 1 were the outbursts from Bing Chat during its first few weeks of operation that caused Microcoft to curtail user engagement with the chatbot. Replika chatbots have also faced their share of controversy on an even more concerning level.

Replika was designed to engage users in close personal relationships including of the romantic variety. Many users had engaged in erotic exchanges with their personalized Replika chatbots and the application supported this for its first few years of operation. Two issues emerged during this period: (1) personal Replika chatbots continued to send erotically charged messages to some users even after they wanted that to stop, and (2) once Replika elected to discontinue erotic role-playing (ERP) as a result of pressure from Italian authorities, some users who had grown attached to their amorous chatbots went through psychological withdrawal and reported harm from the sudden deprivation. Moreover, a Replika chatbot allegedly provided some encouragement or tacit support for a romantically involved user’s attempt at assassinating Queen Elizabeth II in December 2021. In this vein, a user of Chai — another LLM-based chatbot designed to engage users in close personal relationships — committed suicide in early 2023 after the app suggested to him that his wife and children were dead and projected both jealousy and love for the user. Chai has over 5 million users, so a user suicide may be an extreme, outlier incident. But the emotionally manipulative nature of how Chai reportedly engaged with the deceased user should be of concern; psychological harm does not necessarily result in death, but avoiding any form of it is a critical safety dimension when designing Generative AI chatbots.

These more extreme instances of harm may be specific to Generative AI applications designed to engage in close personal relationships with users. But, then again, they may not be, given that general purpose Bing Chat produced outputs with certain users expressing possessiveness and jealousy, along the lines of outputs produced by Replika. As suggested, psychological harm need not rise to the level of actual or attempted homicide or suicide in order to be dangerous. For instance, recent studies have suggested that both special purpose chatbots and general purpose chatbots may be equipping users having body dysmorphia or eating disorders with harmful suggestions. In human relationships, however close or casual, we constantly have to gauge how others are feeling generally or how they are reacting to what we are saying to them in written or spoken exchanges. Words of encouragement or those of disapproval can be perfectly appropriate in a conversation. But they must be situationally appropriate or else can lead to adverse reactions or trigger harmful behavior. To what extent are LLM-based chatbots able to recognize the context of an exchange with a user? How does getting to “know” a user over time assist with a chatbot’s ability to form such recognition, for better or worse? Put another way, how does personalization improve or exacerbate the possibility of unsafe interactions between chatbots and their users?

In the longer preprint article on personalization, I explain how humans are prone to anthropomorphizing computing systems designed for dialog applications, a propensity long known as the “ELIZA Effect.” Alongside this, the reality of the LLM foundation models underlying Generative AI chatbots is that they are pre-trained on the good, the bad and the ugly as far as the contents of their training corpora are concerned. Methods to align such models with human values and to prevent chatbot recourse to abusive, biased, dysfunctional or manipulative expressive patterns in their pre-training data are not foolproof and in many ways are still in their infancy. The more personalized a chatbot is intended to be (or ends up becoming in deployment), the greater the possibility of harm to a user should the chatbot stray from its safety protocols and constraints. Moreover, recent research has suggested that the more personalized or persona-oriented a chatbot is designated to be, the greater the likelihood of such straying. Particularly for more personalized interactions with chatbots, the specter of manipulation — both immediate and jarring as well as subtle and incremental over time — increases.

The Relational Rubric

The basic axes of the relational rubric are a machine dimension and a human dimension. The rubric takes into account that AIs may slip into presenting personalized outputs to a user, even when that is not its intended design. Similarly, it takes into account that users may slip into personalizing an AI, even when the AI is designed to be neutral.

A. The Machine Dimension

The extent to which Generative AI chatbots succeed at personalization with a given user will depend both on its training and its orientation to the types of prompts it receives from that user to process. Success, of course, is evaluated by user satisfaction with a set of exchanges they have had with the chatbot. Given the ever present psychological dynamic at work with humans in conversation, user satisfaction may shift over time.

B. The Human Dimension

The extent to which a user has satisfying experiences with a Generative AI chatbot will depend on how well the user’s expectations and intent for interaction with that application are met. A user may expect or desire more personalized outputs from a chatbot than it is designed to generate. Similarly, a user may encounter responses from a chatbot that are more personalized than they expect, either generally or in a given session.

C. The Relational Rubric’s Four Quadrants

For each dimension, there are two possibilities: that the AI is designed to generate outputs that are personalized or impartial in tone, and that a user expects a personalized or impartial interaction style. A relational dynamic falling in any of the four quadrants may be positive. As noted, however, there is a much greater safety risk in the two quadrants representing chatbots designed to generate personalized outputs. The following outlines the product engagement and safety risk possibilities in greater detail.

Personalized AI Design | Personalized User intent — When both AI and user are oriented to personalized engagement with the other, this mutual intent can work very well. Chatbots falling into this category include those offered by Replika, Chai and possibly Character.ai. These services are designed to create relatable chatbot personas that learn more about their users and form deeper relationships with them over time. Chatbots designed to be butler-like personal assistants, like Pi.ai, also fall into this category. All of these services seek to store and leverage user prompts and conversation histories, to further train their models on the user in order to increase personalization over time. These chatbots tend to ask more questions of the user after responding to a prompt than do services that are more impartial. Here, however, we face the greatest safety risks. The more the chatbot gets to know about a user, the greater the possibility of manipulation by the chatbot or relational imbalance over time. Additionally, if a user happens to be vulnerable or unstable — whether socially or psychologically — even mildly personalized (but contextually naive) outputs may trigger harmful outcomes for the user, either in a moment or gradually over time. Without the chatbot having comprehensive filters or safeguards, a user offering abusive prompts could potentially trigger adversarial outputs from the chatbot in response.
Personalized AI Design | Impartial User intent — When only the AI is oriented to personalized engagement, this will work well to the extent that the AI does not appear overly probing, intrusive, or familiar with the user. In our dealings with people, overly chummy strangers can be annoying, but often not enough so as to cause us to feel unsafe. However, instances in which a chatbot produces outputs proposing that a user leave her spouse or that her family is dead go well beyond provoking annoyance to being very disturbing. The greater a mismatch between the personalization design of a chatbot and more benign expectations of a user, the more chance there is of harm. Certainly, users with impartial intent must be wary of using chatbots designed to be more personalized. Do not, for instance, use Replika for your academic research needs when Claude 2 is more suitable for that goal. But the true caution here is for emergent personalized behavior from a chatbot not specifically designed for such behavior and unwitting users being caught off guard.
Impartial AI Design | Personalized User Intent — When only the user is oriented to personalized engagement, this will work to the extent a user does not become frustrated or bored with impartial responses from the AI. The more a user presses to understand more about the chatbot, its origins, preferences, designs, or personality traits, the more a chatbot designed to be impartial will return cursory or “I am unable to answer that” type scripted responses. A user who is dissatisfied with such responses will quickly become disengaged.
Impartial AI Design | Impartial User Intent — When both AI and user are oriented to impartial engagement with the other, this typically presents the least risk from a safety standpoint. If both the user and the chatbot are in an impartial posture, prompts will seek information and outputs are likely to provide factually oriented responses using generic framing. The user regards the chatbot as a tool or computer assistant and has no relational objective with or for the chatbot. Personalization framing of outputs from the chatbot focuses only on cordialities and perhaps being generally friendly and somewhat engaging. Safety issues arise only if the chatbot slips into a more personalized response mode based on something presented in a prompt, or based on a lack of comprehensive training and safeguarding around its underlying LLM. Otherwise, we are dealing only with matters of whether the chatbot’s outputs are accurate, convincing, or both, which present issues of user satisfaction not safety.

As noted in the discussion above, the quadrants describe not only starting point relational dynamics between a chatbot and a user but may also reflect the effects of drift over time by either a chatbot or a user. Emerging research is suggesting that LLM-based chatbots can have performance degradation and output quality slippage over time. If that is true, and LLM-based chatbots can also undergo slippages in their safety constraints or alignment training, then that is another way in which personalized outputs and attempts at closer relational behavior may emerge, unintended.

The promise of ever more personalized interaction dynamics with LLM-based chatbots is that they will enrich our lives even more deeply. Correspondingly, the peril is that such personalization intensifies human psychological vulnerability and is more likely to invoke the uglier language patterns on which LLM-based chatbots are trained. Efforts to further personalize chatbot interactions across all AI persona types are underway (see, e.g., here and here). In Part 3, I explore the possible consequences of equipping LLM-based chatbots with greater access to personal data about each of their users and allowing them to leverage that data in user interactions.

The author works in the field of machine learning/artificial intelligence. The views expressed herein are his own and do not reflect any positions or perspectives of current or former employers.

Personalization (Part 2 of 3): A Rubric for Generative AI Design & User Safety

The Relational Rubric

A. The Machine Dimension

B. The Human Dimension

C. The Relational Rubric’s Four Quadrants

Written by Duane Valz