Can dynamic capabilities in city governments be analysed through AI-augmented Qualitative Evidence Synthesis?
By Kwame Baafi, Rainer Kattel and Ruth Puttick
The Public Sector Capabilities Index establishes how best to assess dynamic capabilities. As the potential grows for artificial intelligence (AI) augmented research, what can the Public Sector Capabilities Index learn to effectively collect and analyse data about city governments around the world?
One of the most challenging questions in researching public sector capabilities is how we can go beyond case studies. Quite literally, all relevant research on public sector capabilities rests on in-depth case studies, which in turn rely on interviews and desk research. This approach has two challenges: first, such research is backwards looking by trying to understand past failures or successes; and second, it takes a lot of time to create such case studies. Thus, the question is can we speed up the process and analysis of qualitative research and can we make the results at least somewhat future-looking and predictive?
With these questions in mind, we are developing the Public Sector Capabilities Index which assesses the effectiveness of different city governments in responding to problems and opportunities to deliver resident level impacts. In our recent report, we described how we plan to assess dynamic capabilities. The assessment approach consists of three components: measurement, contextualisation and comparison.
Making sense of data
The Public Sector Capabilities Index will generate a large amount of qualitative data. Our elite structured interview will generate hours’ worth of audio and produce pages and pages of qualitative interview transcripts. We will also collect secondary data from the city governments themselves and from other sources, both in the form of quantitative and text-based data, such as academic studies and government reports. With so much data, how do we best collate, organise, and analyse it?
Qualitative evidence synthesis (QES) is a good option. QES brings together qualitative research findings in a systematic way. Acting as an umbrella term that also includes qualitative systematic reviews, qualitative meta-synthesis, and qualitative research synthesis, QES can help understand complex phenomena and provide a rich interpretation of individuals’ and groups’ experiences.
The potential for artificial intelligence to innovate research methods
The rise of artificial intelligence (AI) has started to transform many research methods, often helping research to become more efficient by reducing the amount of time and staffing required. For example, Paul Glasziou and his team at the Institute for Evidence-Based Healthcare (IEBH) have pioneered the “2-week systematic review” methodology, they have demonstrated that full, rigorous systematic reviews can be completed in a fortnight by using automation. This compares to traditional systematic reviews taking months to complete. Not only is the new 2-week systematic review more efficient, but it also improves decision makers’ access to timely evidence.
For Qualitative Evidence Synthesis (QES), LLMs can help by automating and improving various stages of the research process, from study selection, data extraction, thematic analyis to generate codes and themes, and producing summaries. There are resources for using AI with search strategies and data extraction and tools like Cassandre, Notably.ai and the one we are currently trialing, Atlas.ti, to assist with qualitative analysis.
Limitations and risks
Despite the potential, like all research, AI-assisted methods need to maintain high quality and ethical standards. Researchers have studied LLMs, like BERT, GPT-4, and Switch-C, to explore the risks and possible environmental and financial costs. There is also a critique that LLM can generate responses that sound correct because they mimic human-like phrasing, but LLM does not actually understand language or concepts in the way humans do.
Or as Emily M. Bender and colleagues wrote in their paper on artificial intelligence in research, the term stochastic parrot can be used as a metaphor to describe the inability of LLMs to understand the meaning underpinning the language it processes.
Testing the potential of AI-QES for assessing dynamic capabilities
We want to explore the potential for using LLMs in QES to analyse dynamic capabilities in city governments. We also want to ensure that we maintain high standards of quality in our research and ensure that AI-driven QES is truly faster, more efficient and as accurate as non-AI QES. To do this we are setting up an experiment.
Ultimately, LLMs could help us show how dynamic capabilities lead to changes in organisational routines and the use of resources which in turn enable the cities to provide better public services. All these variables are predominantly qualitative in nature, thus engineering prompts to LLMs to analyse large amounts of qualitative data within our conceptual framework should enable us to analyse large number of city governments and their comparative ranking. This model is depicted in the figure below.
Over the coming months, we are conducting workshops with city governments to assess their dynamic capabilities. These workshops will be recorded, and the written transcripts will be analysed using LLMs. We will select an appropriate LLM, such as Atlas.ti, o3-mini; GPT 4, Anthropic/Claude Haiku 3.5 and Sonnet 3.7. And we will develop effective prompts for LLMs to perform tasks such as abstract screening, data extraction, and thematic synthesis.
Alongside this, a team of (human) colleagues will also screen, extract and synthesise the same data.
We will then compare LLMs performance against human experts with the goal of assessing inter-rater reliability. This can be done using Krippendorff’s Alpha, which will provide us with a statistical measure of the extent of agreement between different coders/raters. We will evaluate comparative effectiveness to ask:
- How accurate, sensitive and timely?
- What human element is required in the LLM-driven QES?
- What approach should we use? Human? LLM? A hybrid where some elements of assessment are human, and some are LLM? Or something else?
Get involved
We want to innovate methods to explore the potential of AI-QES to efficiently and effectively assess city government dynamic capabilities. We hope this approach will be faster, cheaper and more accurate than traditional methods. But of course, we need to test this out in practice first. Over the coming months, we will update on our progress through blogs and working papers. In the meantime, we are very keen to hear any feedback and comments. If you have ideas to share, please contact Ruth Puttick, r.puttick@ucl.ac.uk