Using AI as a Systems Change Tool: Preparing to Facilitate

Published in

AI & Systems Change

12 min readMar 18, 2024

Learning with and from an AI platform has helped me prepare to facilitate dialogues about systemic change. I can gain an initial understanding of the complex systemic dynamics in a given setting and challenge my own preconceived assumptions. It can also help in understanding a niche that I don’t know much about within a system that is otherwise familiar to me.

In this blog, I will use a specific real-world example, share prompts I used and explore how three very different AI platforms responded. I am using an example focused on building my understanding of Iran’s nuclear weapons program. Before facilitating a dialogue that included this issue within a larger conversation about political and economic dynamics in the Middle East, I asked questions of ChatGPT 4.0, Perplexity, and Microsoft’s Copilot. I wanted to test some of my assumptions about the context, its history, and different points of view on the issue. With many of the Copilot answers, I also took a next step and reviewed resources it identified to go deeper.

I did not emerge from my conversations as an expert. Rather, I learned enough to let go of some of my long-held and, frankly, not well-informed biases about the region and issue. I also discovered related and important efforts that were not represented in the room, helping me place the group’s conversations into context. Notably, I had to work hard to have the AI partner also place its point of view in context. More on that later.

Let’s be clear — prior to generative AI, I could still do this work. My approach in the past included reading reputable websites, reports, and academic articles and when possible, talking with content and systems experts to build my understanding more gradually.

One of the big shifts available through generative AI is not just the access to information more rapidly, but also the ability to ask questions that help me test my assumptions directly and lay different points of view side by side. Some of my own assumptions were visible to me from the beginning of the AI conversation. Others took time to discover as I looked at how I was responding to the information offered to me.

What prompts help me explore a systemic issue?

Sometimes I like to start with very specific and deep prompts. On an issue where I’m just beginning to wrap my head around it, I tend to start very high level to be careful not to narrow too quickly. However, I always start by asking for a systems change perspective. Sometimes I specifically request a systems dynamic, complexity, and/or social-ecological systems change perspective. Sometimes I’ll ask for the analysis to be undertaken using a specific framework (e.g. Causal Layered Analysis).

In this case, I was more general in my approach. Here are some of the prompts I used to open the conversation. For the sake of testing their effectiveness, I used the same prompts with the three AI platforms.

Opening prompts

Thinking from a systems change perspective, what are the five biggest influences on the progression of Iran’s nuclear weapons program?
These are great. Give me five more that are distinct and different.

Examples of digging in (responding to what it offered)

What about the state of diplomatic relations with global superpowers more broadly, beyond just related to nuclear? How does that have an impact on Iran’s nuclear program? Which superpower seems to have the most influence?
Within Iran today, who are the politically powerful leaders who are against a nuclear weapons program? For it?
Thank you, this is helpful context. Are there non-Western drivers that are pushing the system in similar directs? Pushing it in other directions?

How do I deal with the hidden biases that the AI brings to its answers?

In the first blog, I talked about generative AI as another stakeholder at the table. In this use case, I use generative AI as an expert partner who can answer my questions and discuss the issue. Just like any expert, the generative AI is bringing a point of view, biases, and incomplete information. In this example, I realized I was taking a risk engaging AI on an issue that includes so many strongly held beliefs and biases about the people and context.

To combat this, I used these prompts to make the points of view on the issue more visible, including the AI’s point of view:

Thank you for the helpful information about Iran’s nuclear weapons program. As you look at your own answers, please identify what legal, social and ideological points of view your answers are most aligned with related to the issue. Please also identify the legal, social, and ideological points of view that your answers are not aligned with.

All three AI platforms informed me that they had a “balanced” point of view and then offered other points of view. I followed up on new nuggets of information that surfaced in these responses. For example:

Please offer me an analysis of the Iran nuclear weapons program specifically through the lens of someone who entirely rejects any form of nuclear development due to pacifist or environmental concerns. How would this analysis differ from the “balanced” point of view you have already offered?
This is very helpful. This is the first point in the analysis where you brought up humanitarian impact. How do those who center humanitarian impact differ in their point of view on the development of the Iranian nuclear weapons program, the current state of the program, their desired future for it, and how to get there?
How do those who center deterrence differ in their point of view on the development of the Iranian nuclear weapons program, the current state of the program, their desired future for it, and how to get there? Please also indicate how your original analysis of the issue aligned with or differed from this point of view.

The prompts above are very issue specific, which is critical when exploring biases with an AI partner. More generally, my strategies for dealing with bias and misinformation with my AI “stakeholder” at the table include:

Reading the sources (when offered)
Tracking down sources that explain an issue more fully (when not offered automatically)
Looking for triggering or signaling language (e.g., descriptions of an issue that imply a moral judgement or black/white thinking on a topic) and challenging those in follow-up prompts
Looking for evidence of a thematic pattern — what keeps coming up regardless of prompts offered — and challenging the AI through my follow-up prompts
Asking questions that challenge previous statements or asking for distinctly different answers

In this specific example, after using the prompts above and follow-up steps to examine sources, I asked each AI to generate a spreadsheet for me that compared the different viewpoints. I told them exactly which viewpoints from their previous answers I wanted to compare and the information I wanted in each column. They generated spreadsheets with basic information that gave a quick comparison. Copilot and Perplexity were also able to give me sources, but ChatGPT experienced errors whenever it tried to generate a spreadsheet with sources or references to follow-up on.

Which platform was most helpful?

My ChatGPT and yours will not be the same. I have been using ChatGPT to explore systemic change issues for over six months and by doing so, I am steadily training it on systems thinking and the level of depth I prefer to go into. However, using a fresh account and ChatGPT 3.5, I found the answers similar though not always quite as in-depth. My use of Copilot and Perplexity, in contrast, is much more recent.

With this caveat in mind, here is how the three platforms performed. For those who don’t want the blow by blow, here’s a quick summary:

ChatGPT was the best conversational partner, but lacked sourcing which made it hard to follow-up and test its thinking.
Copilot provided the least depth and least useful answers.
Perplexity demonstrated some very worrisome biases, enough so that I question continuing to use it for this type of learning.

Overall, none of these platforms can be helpful alone. I’ll keep testing with others, but my learning was best served by looking across them.

ChatGPT 4.0 (leveraging the training I’ve already done with it)

Strengths:

ChatGPT offered interesting insights that came after the bulleted lists that responded directly to my prompts. For example, I asked it “Within Iran today, who are the politically powerful leaders who are against a nuclear weapons program? For it?” In response, I got a list of people and the points of view that are held by many people, as expected. Then, it offered a summary of important considerations including the dynamic and fluid nature of these stances and the complexity of public statements versus private views. In contrast, CoPilot often ends its answer with a summary that offers no additional insights.
ChatGPT offered a nuanced range of points of view on the issue and where its “balanced” point of view sat in relationship to the others. It was capable of reporting that its own point of view was a distinct perspective and explained how differed from others.
The answers tend to be longer and more in depth than Copilot. As a learner, I felt more knowledgeable using ChatGPT.

Challenge:

It does not automatically or easily offer up its sources! I can ask ChatGPT to tell me where to go for more information on a specific answer, but it does not do it automatically, and it can freeze or generate an error message when asked for sources. Most often, it offers a generic answer (e.g., suggesting a non-profit, a media source, or an academic journal that has relevant information) without citing a specific article or report. This is a massive challenge and undermines the value significantly.
Side note: this blog does a good job of walking you through how to make ChatGPT give you high quality sources (and some of the limitations you will experience in doing so).

Copilot (set to its “More Balanced” temperature level)

Strengths (with some accompanying limitations):

Sometimes the information can be more directly in response to the question than ChatGPT, which is helpful. But this is also a limitation. It does not take the “conversation” to the next step with me, but rather relies on me to do this entirely, which can end up centering my point of view.
The citations you can link directly to are very helpful. Yet, Copilot just as often doesn’t cite information and this is particularly when it is asked to share and compare different points of view. Also, it might use one citation for a long answer with many bullet points, where you would ideally want to see multiple citations to more fully explore the question.

Challenges:

Copilot did not do a good job when asked how its perspective differs from others — similar to ChatGPT, it was focused on its own “balanced” perspective, but it had a much less sophisticated understanding offered up of other perspectives. Yet, when asked about a specific alternative perspective (e.g., humanitarian, anti-nuclear weapons, and deterrence), it offered a decent amount of information initially. However, this depth disappears if you ask it to compare its original answers to other perspectives it has surfaced.
Copilot doesn’t go deep, even when you’re begging it to! Even when asked to offer 3–4 sentences instead of one, it still gives one short bullet or a couple short phrases. This means the pathway to deeper understanding requires far more prompts, tied to specific bullet points in previous answers. This can distract from the larger conversation.

Overall, ChatGPT would have been the big winner if only it would provide its sources more easily! It felt more like a conversation, where we each pushed back and forth on each other, generating insights that prompted me to think about new questions I hadn’t planned on asking.

CoPilot felt like it was giving the bullet points to help me write my research paper, while making sure I wouldn’t just lift the narrative right into the paper. But in doing so, its answers were shallow and harder to place in context.

Perplexity (basic level, not a pro account)

Strengths:

Citations! Perplexity offered sources for every response, though on some responses the sources were listed but not linked to specific information.
Perplexity offered the best chart comparing view points. Unlike ChatGPT, it provided sources. Unlike Copilot, it was able to deepen its information when asked for more detail.

Challenges:

Perplexity did not follow its own thinking across the full conversation. For example, when asked to compare its original analysis of Iran’s nuclear weapons program with the deterrence perspective, it instead compared the analysis in the prior question (where it was asked to explain the humanitarian point of view). In contrast, ChatGPT clearly referenced back to its original “balanced” analysis.
Perplexity offered a lot of text, but its answers were often weaker and less nuanced than ChatGPT. For example:

When I asked it to tell me about Western and non-Western current and historical influences on the issue, Perplexity was vague and sounded a bit like talking points. In contrast, ChatGPT offered far more specific historical context on Western/global power influences.
Perplexity provided some very incomplete answers. For example, it failed to name the humanitarian impact of sanctions when asked to explain this point of view. Due to prior knowledge, I was aware of the significant humanitarian impact sanctions have had on the Iranian people. However, Perplexity only identified the potential risks of a nuclear weapons use, escalation of conflict, and risk of proliferation as the humanitarian issues.

Significant challenge:

Finally, a challenge that trumps all others: The bias was glaring and disturbing. I’ll give two examples, but there were more:

When asked about regional influences, the Perplexity answer was embarrassingly short, lacked good citations, and only cited resources generated by Western and global organizations. It also ended by emphasizing global and Western nation dynamics as major influencers, despite the prompt being about regional dynamics.
When I asked Perplexity to explain which legal, social, and ideological points of view it was not aligned with, here is how Perplexity answered in comparison to ChatGPT on the legal points of view specifically:

Perplexity: Legal Perspective: The answers may not fully align with viewpoints that prioritize unilateral actions or disregard international legal frameworks in addressing Iran’s nuclear program. Perspectives that undermine the significance of international agreements or advocate for aggressive measures without legal justification may not be entirely aligned with the responses provided.
ChatGPT: Legal: My answers may not fully align with perspectives that challenge the current international legal framework as being biased or inadequate in addressing nuclear proliferation fairly. Critics might argue that the NPT and related agreements perpetuate a nuclear imbalance, favoring existing nuclear-armed states while restricting others.

Notice the judgement implied in Perplexity’s answer — “disregard” and “undermine” are words that imply the lack of legitimacy of these perspectives, while ChatGPT offers an explanation for why some people would reject these legal frameworks, making these points of view sound reasonable and worth understanding. Referencing back to my earlier tips, the language Perplexity used was a clear example of “signaling” language — in this case, signaling that it was making a judgement call on the legitimacy of these points of view instead of simply reporting on them.

Will you use AI to help you enter into new, dynamic and complex systems?

The comparison above highlights the messiness of using AI to engage in foundational learning work. Perplexity’s answers also signal the dangers of AI misleading and perpetuating biases.

Yet, at the same time, I find this use of AI to be helpful and a little addictive. The learning I can do on any given system is so broad and the ease of digging deep when something catches my interest is so easy. I see this as a risk — I need to remind myself to keep seeking out the sources, learning from real people, and actively looking for the information the AI didn’t include.

For me, despite this risk, the advantage is clear. Provided I work across platforms, actively look for bias and investigate beyond what the AI is offering up, I can be more prepared when I come into dialogue with other systems stakeholders. With this background knowledge, I am less likely to ask questions that don’t help the dialogue move forward, more able to notice when participants in a meeting are holding competing points of view, and more likely to realize when we’re talking about a symptom of the problem rather than a systemic driver. All of these support me be a less ignorant and more helpful facilitator. The limitations of AI and my own depth of learning also keep me humble: I know I’m not heading into these rooms as an expert.

I would love to hear how others are using AI to help with their systems knowledge when entering new spaces. Which platforms work best? What types of prompts are you using? How do you handle biases?