Harnessing AI for qualitative interviews: Key insights from our evidence review

Natalie Lai
Discovery at Nesta
Published in
6 min readAug 1, 2024
Source: Generated using Microsoft Designer AI

This article was co-written with Camille Stengel

Generative Artificial Intelligence (AI) offers considerable opportunities in qualitative research. While much attention has been given to generative AI’s role in automated text and sentiment analysis, literature reviews, and coding, its use in interview settings has been less explored.

If interviews can be conducted by chatbots rather than people, there is both a huge opportunity for scaling, potentially mechanising what is currently a ‘cottage industry’, and blurring the boundaries between qualitative and quantitative research. Of course this new technology does not come without risk and ethical questions. Chatbots can be unpredictable, exhibit bias, and ‘hallucinate’ by giving incorrect but true-sounding responses. There’s also the question of just because we can, should we have chatbots conducting interviews?

Rather than replacing human-led interviews, how can this technology complement and enhance existing research methods? Our aim is to understand the potential use cases for impact in qualitative research and in Nesta’s work, as well as identify the pitfalls to avoid when engaging with this tool.

To better understand how generative AI might be harnessed for interviews as part of qualitative research, Nesta and the Behavioural Insights Team (BIT) have conducted a review of the evidence which is summarised in this blog.

This is the first in what we aim to be a couple of blogs sharing our reflections and learning as we explore this area further. We have pulled out several key insights for those considering AI-powered chatbots for qualitative interviews:

  1. AI-powered interviews enable large-scale qualitative data collection
  2. AI chatbots struggle with depth and flexibility in interviews
  3. Mixed Participant Preferences for AI vs. Human Interviewers
  4. Limited social presence in AI interviews impacts participant comfort
  5. Balancing human control and automation in AI system design to maintain researcher agency

What we did

As a starting point to assess the potential benefits and risks of generative AI-powered approaches to qualitative data collection, we conducted a rapid evidence scan of the peer reviewed empirical studies published in English. We found six empirical studies that explicitly focused on generative AI and qualitative interviews (selecting only from peer-reviewed journals and reports from reputable research institutions; studies that focused on general use of AI in research were excluded).

Key insight 1: AI-powered interviews enable large-scale qualitative data collection

One of the most exciting opportunities AI-powered interviews offer is the ability to scale qualitative research. Scaling AI interviews effectively requires managing both broad topic coverage and deep, nuanced inquiry. Undeniably AI chatbots excel at gathering large volumes of qualitative data quickly. In one of the studies we reviewed, the chatbot collected over 7,000 open-ended responses from 395 adults within one month. Some argue that integrating this extensive data with quantitative analysis could bridge the gap between qualitative and quantitative methods, leading to a richer and more comprehensive understanding of the issues studied.

This capability to scale up data collection provides researchers with the opportunity to engage a larger and more diverse participant pool, thereby enriching the insights obtained. The speed and efficiency of AI in processing large volumes of data facilitate a broader exploration of topics, making it possible to collect comprehensive feedback on various issues in a fraction of the time required for traditional methods.

Key insight 2: AI chatbots struggle with depth and flexibility in interviews

AI-powered chatbots face challenges in maintaining depth and adaptability. Their responses are often limited by human-defined frameworks, which restrict their ability to explore unexpected themes, handle nuanced follow-up questions and conduct more in-depth interactions. For example, the use of pre-defined reaction guides and strategies led to inappropriate responses in one of the studies we have reviewed. These limitations highlight the need for more sophisticated AI-driven interview designs to improve its depth and flexibility.

Additionally, if respondents deviate from expected paths, we discovered AI-chatbot struggle to redirect the conversation effectively, further limiting the depth and adaptability of the interview. Our findings suggest that these constraints impact the consistency and relevance of follow-up questions generated by chatbots, affecting their overall effectiveness in conducting in-depth interviews.

Key insight 3: Mixed Participant Preferences for AI vs. Human Interviewers

Our review uncovered that participants have mixed preferences when it comes to AI versus human interviewers, which largely depends on the context and nature of the interview questions being asked. Participants preferred AI interviews for sensitive topics, such as financial decisions, where the absence of judgement and anonymity reduced concerns about personal disclosure. The perceived neutrality of AI in these scenarios offered a significant benefit

Conversely, human interviewers were preferred in situations requiring empathy and emotional intelligence, such as job interviews or evaluations, due to their ability to convey empathy and understanding. The ability of human interviewers to offer empathetic responses and interpret nuances in emotional context made them more effective in these scenarios. This contrast highlights the ways in which AI and human interviewers are perceived, underscoring the need for careful consideration of context.

Key insight 4: Limited social presence in AI interviews impacts participant comfort

It was clear that social presence, the extent to which one feels they are in the presence of and interacting with a ‘real person’, is a crucial factor influencing participants’ experiences with AI interviews. In one of the studies participants reported feeling greater uncertainty and perceived lower social presence during AI interviews compared to interviews conducted by a human.

The unfamiliarity and lack of personal interaction with the AI system contributed to these feelings, leading participants to speak more quickly and with fewer pauses. Despite the use of a human-like avatar, which somewhat improved the sense of connection, it did not sufficiently ameliorate the issue. This discomfort stemmed from a perceived lack of personal interaction and empathy, highlighting the challenge AI chatbots face in fostering a supportive and engaging interview dynamic.

Key insight 5: Balancing human control and automation in AI system design to maintain researcher’s agency

Whilst most researchers are keen to make the most of AI’s capabilities, many are worried that automating interviews could undermine their agency and result in a loss of nuance.

We decided this was worth looking into specifically, so we reviewed the existing studies on the role of human researchers’ control versus AI automation and found varying degrees of integration. To make sense of these findings we grouped them using a two-dimensional Human-Centered AI (HCAI) framework, suggested by Ben Schneiderman, that plots human control against AI automation:

HCAI Framework extracted from Shneiderman, B. (2020). Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction, 36(6), 495–504.
  • high human control + high computer automation: In these instances human researchers created a tight framework within which the AI could operate with a high degree of automation, dynamically adapting its responses based on predefined rules. For instance, one study developed a decision gate process featuring 29 question variations and rule-based probing. Similarly, another study utilised a decision tree that directed the AI to use models that generate follow-up questions, manage topic transitions, and summarise conversations, but that were based on rules established by human researchers.
  • high human control + low computer automation: These involved the overall structure and key interactions being carefully designed and controlled by human researchers. Users can personalise and configure information processes with a low level of automation. For example, one study used a fixed set of primary questions and a set warm-up question designed by human researchers. Follow-up questions were either pre-generated by researchers or by generative AI, depending on the condition.
  • low human control + high computer automation: These examples saw a significant portion of the research process being automated. For example, one of the studies used 5 seedling questions to form a basic structure for each interview. Generative AI was then allowed to determine whether to ask follow-up questions, and decide what those questions should be, without relying on human-designed guidelines. This flexibility allows AI interviewers to delve deeper into unexpected themes.
  • low human control + low computer automation: This combination would not be a feasible qualitative research scenario and therefore there is no evidence base for this.

What’s next?

We’re building on our key takeaways from the evidence scan as we test out different AI-powered interview bots within the context of Nesta’s missions. We’re evaluating the bots’ interviewer effectiveness across different platforms and using various forms of criteria. This includes assessing AI-interviewers’ ability to streamline data collection processes, enhance participant engagement, and maintain safeguarding and ethical standards.

Watch this space for subsequent blogs where we’ll share our reflections of this assessment. If you’re currently using or considering AI-powered interview platforms, we’d love to hear from you — get in touch!

We thank Celia Hannon, Laurie Smith and Faizal Farook for reviewing the article and providing insightful feedback.

--

--