AI Transparency in the Age of Large Language Models: A Human-Centered Research Roadmap

by Q. Vera Liao (Microsoft Research, Canada)

Published in

Human-Centered AI

15 min readSep 20, 2023

Photo by Touann Gatouillat Vergos on Unsplash

I thank the Editors of HCAI for inviting me to write a post about our position paper (co-written with my fantastic colleague Jenn Wortman Vaughan): AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. Instead of trying to summarize this (long) paper into a blog post, I thought it would be interesting to share some thoughts on why we wanted to write this paper. Hopefully, they intrigue some readers to check out the paper, and we would always love to hear your feedback!

We began to write this paper in the spring of 2023, right after the rise of ChatGPT and other LLMs (large language models). While capturing intense public and academic enthusiasm, LLMs are still “mysterious” technologies, as even today, we do not have clear answers to some basic questions such as: What can LLMs do? How do they work? How well can they do what they can do? As the most powerful LLMs are developed by large tech companies and by non-profit organizations (although efforts for open-sourced LLMs are growing), many details about the models are often proprietary, making it difficult to answer questions such as: How were the LLMs developed? What went into the training data? What are the architectural details? In other words, there is a huge lack of transparency with current LLMs.

Why should we care about transparency? We can argue from different disciplinary perspectives. In Human-Computer Interaction (HCI), transparency is tied to a fundamental aspect of interface and interaction design, helping people form an appropriate mental model of the system, knowing what the system can do and how the system works, so they can interact effectively with the system. In Responsible AI (RAI), transparency is one of the principles for mitigating risks of AI technology by ensuring human understanding and oversight. Ultimately, it comes down to the importance of “human understanding of AI”, which we take as a broad definition of what transparency is about.

Developers of LLMs must understand the models to debug them, assess whether they are ready to launch, and enforce safe usage policies. Business decision-makers, designers, and developers building LLM-infused applications must be able to understand the model’s capabilities and limitations in order to make responsible decisions about whether, where, and how to use the model. End-users must be able to form a sufficiently accurate understanding of how the model works to control the application’s behavior and achieve appropriate levels of trust and reliance. People impacted by LLMs (e.g. a group frequently misrepresented by LLM generated content) should be able to understand their options for recourse. Additionally, we should expect to see an increasing demand for transparency from policymakers and third-party auditors aiming to regulate and oversee the development and use of LLMs.

As LLMs enter public awareness, together with their known risks such as encoded biases, “hallucination” of misinformation, toxicity, and the frightening danger of misuse, we can only expect the public demand for transparency of LLMs to rise. But transparency is not only a normative issue, it also involves tremendous technical challenges and human-centered questions (after all, it is about “human understanding”). While some academic explorations of technical approaches to LLM transparency are already underway (e.g. holistic evaluation, mechanistic interpretability), without putting at the forefront what people want to understand about LLMs and how they form an understanding of LLMs with given information, we may end up with ill-informed research efforts as well as misguided adoption of transparency approaches.

That’s why we wanted to write this paper. Our work outlines a research roadmap that we hope provides valuable input for research communities to focus future research efforts on. This paper does not prescribe foolproof ways on how to achieve transparency of LLMs (spoiler alert: we also don’t know yet!), but highlights that research and practice do not have to start from scratch, because the HCI and RAI communities have wrestled with human-centered questions about transparency of AI models and systems for years, long before this wave of LLMs. We believe many of these questions will remain the same, but call for new answers that we can begin to investigate by inquiring into “what’s new” or “what’s different” about LLMs. We hope this offers a useful way of thinking about transparency in the LLM era, especially at a time when change feels imminent and inevitable. Below, I will elaborate on what lessons we can learn from the past, what human-centered questions we should continue asking, and where we may begin to think about what is new or different.

Learning from the past: Embracing diverse transparency approaches

There are many ways to enable people to achieve understanding, which in itself is a multi-faceted and context-dependent goal.To begin with, we can differentiate between a functional understanding — knowing what a model or a system can do, and a mechanistic understanding — knowing how it works. However, in the research community and the industry at large, currently, there is often a tendency to gravitate towards, and even equate “transparency” with, explainability, which focuses more on the facet of mechanistic understanding. Within the field of explainability, there is also often a fixation on a few popular techniques and limited forms of explanation (e.g., feature-importance explanations). Corbett and Denton offer a comprehensive historical perspective and critique on “transparency,” and researchers such as Upol Ehsan take a broad and human-centered view on explainability.

This fixation not only results in missed opportunities to leverage diverse approaches but it also dangerously insinuates the assumption that there are “one-fits-all” solutions to transparency (more on why this is dangerous below). In our position paper, we strove to provide a comprehensive overview of the landscape of existing transparency approaches,and posed open questions for how to extend them to LLMs. We focus on four categories:

Transparent model reporting. Often implemented as “documentation” features in practice, there are established frameworks about how to report information about the model or the training data, such as Model Cards, FactSheets, and Datasheets. For example, the Model Cards framework specifies reporting information categories including: model inputs and outputs, the algorithm used to train the model, the training data, additional development background, evaluation results, the model’s intended and unintended uses, and ethical considerations.

Publishing evaluation results. It is common practice to evaluate a model by a set of accuracy metrics. However, to understand “what the model or system can do” and “how well it does what it can do,” people may care about a broader range of criteria including: fairness, robustness, efficiency, and other context-specific impact the model can have. Each of these criteria opens up complex questions and choices regarding “how to evaluate” a model. For example, fairness can be assessed through a diverse set of metrics or a disaggregated evaluation of performance by individual, cultural, demographic groups, or domain-relevant conditions, as well as intersections of multiple groups or conditions.

Explainability. While some models can provide “intrinsic explanations” directly from the model if their decision processes are relatively intuitive for people to follow, other types of model architectures (e.g., modern deep neural networks) are too complex and opaque for people to understand decisions or predictions directly, and sometimes they are complete “black boxes” if the model internals are kept proprietary. In these cases, one can leverage a large and still growing set of “post-hoc” explainability techniques to generate explanations. Importantly, there are diverse forms of explanations, including global explanations about the model’s overall logic, local explanations about a particular model output for a given input, and counterfactual explanations about how the input can be changed to obtain a different (often the desired or expected) output.

Communicating uncertainty. While some models can provide an estimation of uncertainty of their output (e.g. a likelihood of having produced an incorrect output), other types of models either cannot do so intrinsically, or they can only provide estimations that are not well-calibrated (e.g. failing to reflect the true likelihood of having produced an incorrect output). Hence, a rich set of techniques have been developed to generate, evaluate, and re-calibrate uncertainty estimation. Further complicating this issue is that uncertainty can arise from different sources (e.g. inherent randomness in the data or a lack of knowledge about the best possible model), it can take different forms depending on the output (e.g., a confidence score for a classification model v.s. a distribution over possible outcomes for a regression model), and importantly, it can be communicated in different ways involving design decisions such as the precision (e.g. showing a precise quantity or a coarse category) and modality (e.g. by text or visualization).

Learning from the past: Centering on the people

We now turn to a few key lessons from HCI and RAI research around human-centered approaches to AI transparency — studying how people cognitively process and use transparency information, including how the social contexts impact these processes. Hopefully, these lessons provide useful starting points to consider what people will want LLM transparency for, what challenges they may face, and hence, where the community should direct its efforts.

Transparency is a means to many ends. There is a goal-oriented perspective underlying studies of people’s usage of transparency features. As the purpose of human understanding is to serve downstream cognitive tasks such as learning, decision-making, and attitude formation, people want transparency for many different goals. For example, Suresh et al. lay out common goals for why people seek AI explanations, including improving a model, ensuring regulatory compliance, taking actions based on model output, justifying actions influenced by the model, understanding data usage, learning about a domain, and contesting model decisions. This goal-oriented perspective requires articulating people’s end goals with understanding AI, and then choosing, developing, and evaluating transparency features according to this end goal. Given the new types of stakeholders emerging with novel types of LLM-infused applications, such as “prompt engineers” doing model adaptation or writers and readers as stakeholders of LLM-infused writing support systems, we must empirically investigate: What are the new transparency goals people have with LLMs?

Transparency should support appropriate trust. While transparency is often embraced by the tech industry as a mechanism for “building trust,” transparency does not warrant trust in itself, but rather should enable people to form an appropriate level of trust — enhancing trust when a model or system is trustworthy, and reducing trust when it is not. However, empirical studies on the relationship between different transparency features and user trust have painted a complex picture. For example, HCI studies (e.g. Zhang et al.) repeatedly showed that feature-importance explanations can lead to overreliance — increasing people’s tendency to mistakenly follow the AI outputs when they are wrong — which can be attributed to the complexity of current explanation techniques and incompatibility with people’s reasoning process to detect model errors. Given the anticipated wide adoption of LLMs and uncertainty surrounding their capabilities, it becomes more critical for end-users, regulators, and other parties to assess the trustworthiness of LLMs for given contexts, and for the research community to ask: Which approaches to transparency can best support appropriate trust of LLMs and how?

Transparency and control go hand-in-hand. Many of the end goals people have with transparency, such as improving or contesting the model, cannot be achieved without also having control mechanisms through which to take action. Indeed, transparency and control have long been studied together in HCI as intertwined design goals for effective user experiences, simultaneously asking what information about a model should be presented to users and what forms of input or feedback users should be able to give in order to steer the model. While safety and control have become central topics in research and practices around LLMs, we encourage the community to consider the role of transparency in enabling diverse stakeholders to understand and steer LLM behavior, asking: How can different approaches to transparency contribute to better control mechanisms for LLMs?

The importance of people’s mental models. As discussed, transparency is about shaping people’s mental models appropriately. Developing effective transparency features should also take into account people’s existing mental models of the AI in order to avoid redundant information, and more importantly, to correct existing flaws. However, it is known that a mental model, once built, is often difficult to shift even if people are aware of contradictory evidence, which highlights the importance of responsible communication (e.g., in marketing material and media coverage) to accurately shape the public perception around new technologies like LLMs. Moreover, the complexity of LLMs and their ecosystem, such as the blurred boundaries between the base model (and the brand behind it), the adapted model (created by fine-tuning or prompting), and LLM-infused applications, bring a fundamental question: How can we unpack people’s mental models of LLMs and support forming useful mental models?

How information is communicated matters. HCI research in the area of AI transparency is not only concerned with what information to communicate about a model, but how to communicate it. For example, the output of an uncertainty estimation or explainability technique can be communicated through different modalities (e.g., by visualization or in natural language), at different levels of precision or abstraction, framed using different language, supplemented with different information to close any gaps in understanding, and through various other visual and interactive interface designs. These choices of communication design can significantly impact how people perceive, interpret, and act on the information provided. The natural language modality and highly interactive nature of LLMs, together with their new training and control mechanisms, pose the question: What are the new opportunities and challenges for communicating information during interactions with LLMs?

Beware of the limits of transparency. Lastly, we draw attention to some critical perspectives on the limits of transparency offered by FATE (Fairness, Accountability, Transparency and Ethics) and STS (Science and Technology Studies) scholars. First of all, model-centric transparency without ensuring meaningful effects on people’s end goals loses its purpose, and worse, can create a false sense of power and agency. Second, transparency can be misused to shift accountability and place burdens on users, and can even be used to intentionally occlude information. This is a warning to pay attention to the consumability of transparency features and also to seek alternative paths to ensure accountability. Lastly, transparency approaches can lead to harms if used maliciously. In addition to the risk of exploiting user trust and reliance, they can also threaten privacy and security by revealing details of model architecture or training data.

To contemplate the new

Figure 1: Summary figure in the paper, listing unique challenges (in purple italics) that arise from the technology and human perspectives of LLMs and open questions for transparency in the age of LLMs

To arrive at the open questions of how these lessons learned about transparency approaches and human-centered perspectives may apply to LLMs, we spend a large portion of our paper discussing the unique challenges to achieving transparency for LLMs. Figure 1 provides a summary of these unique challenges (in purple italic) and the open questions they raise. I encourage you to read the paper to learn more, but acknowledge that they are not meant to be exhaustive and may be limited by our own knowledge, interests, and biases. I hope our work highlights that a human-centered research roadmap to transparency should not only consider challenges that arise from the new technical properties of LLMs (e.g., complex capabilities as “general-purpose” models, uncertain behaviors that cannot be fully anticipated and specified by model developers, massive and opaque architectures), but also pay attention to the following aspects:

The new ecosystem of LLMs. LLMs are poised to change how people leverage AI capabilities to create applications and services. Not only can they lower the barriers to doing so, but they could also bring some fundamental changes to the ecosystem of AI system development and deployment. One change is an overall leap toward homogenization — heterogeneous applications could be powered by a single LLM. For transparency, this requires a set of approaches to enable people of diverse backgrounds with diverse end goals to understand the model, and grapple with questions such as how to characterize the capabilities and limitations of the model to effectively serve heterogeneous downstream use cases. Another change is the process. Rather than developing their own domain-specific models, practitioners may instead engage in varying degrees of model adaptation (e.g., through fine-tuning and/or prompting) to steer the behaviors of the base LLM, and may further augment the LLM with input and output filters, plug-ins, and other technical components. These new development processes raise interesting questions, such as, at which level should transparency approaches take place (base model, adapted model, or LLM-infused system), or what new approaches may be required for new types of LLM-infused systems?

The diverse and new types of stakeholders of LLMs. Ultimately, taking a human-centered perspective means focusing on people first, which involves empirically investigating their transparency needs, goals, and behaviors. We should consider stakeholders at all levels including people who develop, deploy, audit, use, and may be impacted by LLMs. On the development and deployment side, we must note that there may be different parties involved in training the base model, adapting the model for specific domains, and building LLM-infused applications. Each of these parties may have distinct transparency needs, and be comprised of diverse types of people. For example, adaptation can be done by developers, designers, entrepreneurs, or even end-users. Also, many types of end-users and impacted groups emerge with the rapidly growing number of novel LLM-infused applications. While some common goals of transparency discussed above, such as having appropriate trust and control of the model, will remain important, the community should investigate the transparency goals and needs of stakeholders grounded in popular LLM-infused applications (e.g., search, writing support, coding assistant, other specialized chatbots) to inform transparency approaches that can make a real-world impact. Prior HCI research also provides useful methodologies to do so. For example, in my own work with collaborators, we developed a question-driven design process to probe people’s explainability needs early on, and we recently applied a similar process to studying what kinds of explanations may be useful for LLM-powered coding assistants.

The social, organizational, and societal contexts. Human-centered approaches implicitly come with a sociotechnical perspective: people do not develop, deploy, or use LLMs in isolation, but are embedded in specific socio-organizational contexts, and have their needs and behaviors shaped by social interactions with other people, culture, norms, policy, and the broader ecosystem and societal trends. When considering challenges to the transparency of LLMs and exploring solutions, we cannot ignore barriers posed by social, organizational, and societal factors. For example, effective transparency requires taking into account people’s existing mental models, but public perception of LLMs may be evolving, unstable, flawed, and shaped by complex mechanisms including mass media, marketing campaigns, ongoing events, and design choices of popular LLM-infused applications. Another elephant in the room is the proprietary nature of many current LLMs, and the lack of incentives for for-profit organizations to provide transparency, which could also be at odds with organizational pressure to “move fast and deploy at scale.” Addressing these fundamental challenges may not be possible without policy and regulatory efforts that enforce transparency requirements on LLM creators and providers.

I hope you will find these perspectives on transparency in LLMs useful and inspirational in your research. Please don’t hesitate to contact me with any questions and I encourage you to check out the full position paper AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap.

References

Ananny, M., & Crawford, K. (2018). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. new media & society, 20(3), 973–989.
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., … & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
Corbett, E., & Denton, E. (2023, June). Interrogating the T in FAccT. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 1624–1634).
Chen, V., Liao, Q. V., Vaughan, J. W., & Bansal, G. (2023). Understanding the role of human intuition on reliance in human-AI decision-making with explanations. To appear in Proceedings of 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW 2023)
Ehsan, U., & Riedl, M. O. (2020). Human-centered explainable ai: Towards a reflective sociotechnical approach. In HCI International 2020-Late Breaking Papers: Multimodality and Intelligence: 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings 22 (pp. 449–466). Springer International Publishing.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92.
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., … & Koreeda, Y. (2022). Holistic evaluation of language models. arXiv preprint arXiv:2211.09110.
Liao, Q. V., Gruen, D., & Miller, S. (2020, April). Questioning the AI: informing design practices for explainable AI user experiences. In Proceedings of the 2020 CHI conference on human factors in computing systems (pp. 1–15).
Liao, Q. V., Pribić, M., Han, J., Miller, S., & Sow, D. (2021). Question-driven design process for explainable AI user experiences. arXiv preprint arXiv:2104.03483.
Liao, Q. V., & Varshney, K. R. (2021). Human-centered explainable ai (xai): From algorithms to user experiences. arXiv preprint arXiv:2110.10790.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., … & Gebru, T. (2019, January). Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency (pp. 220–229).
Nanda, N., Chan, L., Liberum, T., Smith, J., & Steinhardt, J. (2023). Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217.
Norman, D. A. (2014). Some observations on mental models. In Mental models (pp. 15–22). Psychology Press.
Sun, J., Liao, Q. V., Muller, M., Agarwal, M., Houde, S., Talamadupula, K., & Weisz, J. D. (2022, March). Investigating explainability of generative AI for code through scenario-based design. In 27th International Conference on Intelligent User Interfaces (pp. 212–228).
Suresh, H., Gomez, S. R., Nam, K. K., & Satyanarayan, A. (2021, May). Beyond expertise and roles: A framework to characterize the stakeholders of interpretable machine learning and their needs. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1–16).
Tullio, J., Dey, A. K., Chalecki, J., & Fogarty, J. (2007, April). How it works: a field study of non-technical users interacting with an intelligent system. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 31–40).
Vaughan, J. W., & Wallach, H. (2020). A human-centered agenda for intelligible machine learning. Machines We Trust: Getting Along with Artificial Intelligence.
Zhang, Y., Liao, Q. V., & Bellamy, R. K. (2020, January). Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 conference on fairness, accountability, and transparency (pp. 295–305).