Analysis | Harnessing AI for diplomacy
Five tools to make your work easier
This article is part one of three pieces on the practical relationship between advances in artificial intelligence and diplomacy.
It is part of ISD’s ongoing series, “A better diplomacy,” which highlights innovators and their big ideas for how to make diplomacy more effective, resilient, and adaptive in the twenty-first century.
As I’ve written previously, advances in large language models (LLMs) and other generative AI have put new tools into the hands of everyone with a device and internet connection. And while many have toyed with ChatGPT, often with mixed results, many new tools can unlock productivity even in the idiosyncratic field of diplomacy. After months of personal testing, I’ve whittled down my long list into five tools that offer users a unique advantage and have an outsized potential to improve efficiencies in diplomatic work. That said, AI tools carry risks: any task that requires 100 percent accuracy is likely a bad fit for generative AI. At its core, LLMs are stochastic — that is, they need a degree of randomness to produce new material. Without it, they would simply reproduce verbatim their training data. That randomness means that when you ask an LLM the exact same question twice, the response will vary.
And another word of caution: one should refrain from posting sensitive or proprietary information (unless you are hosting an instance of GPT4 or other models). According to Techradar, organizations share highly sensitive information, including source code, with GPT4. While OpenAI may not intend to expose user data, high-profile data breaches have taught us that even the most sophisticated vaults can be accessed when faced with a determined foe. Plenty of use cases don’t require you to share proprietary information.
1. Research
I’ve been testing UpWord.ai, a simple platform that lets users keep track of sources while helping them summarize larger texts into more manageable lengths. The company’s founder and CEO, Roee Barak, told me his team is focused on building a tool that puts people, not the AI, in the driving seat. “We’re trying to provide everyone with an AI assistant to help people with research, but we believe that AI cannot do the research for you — it has to be led by you.” This is critical: LLMs are great at crunching large volumes of text and manipulating that text, but they’re not a replacement for human judgment (not yet at least). “AI should work for you, not the other way around,” Barak told me, noting that the team at UpWord is focused on creating a tool that speeds up old manual processes and helps tame the information overload we’re all experiencing.
One major limitation of current LLMs is the context window — that is, how much text it can handle in one given instance. When I first tried using an LLM to summarize a White House policy document, I mainly came away with gibberish. I was frustrated enough that I created my own small program (you can find it on GitHub) that chunks PDFs and sends them to GPT3.5 one section at a time. This worked well, but it was a bit clumsy. And since that clumsiness compounds over larger texts, most models need to chunk a few pages worth of text, summarize those, then summarize the aggregate information. The end result is inaccurate: a human would do a much better job of teasing out the essence of something like Shakespeare’s Hamlet. Claude-2, the latest model from Anthropic, can ingest much larger volumes of text, though with mixed results. Upword uses a chunking method that lets users tweak the end result, meaning you get a useful summary but don’t lose too much in the process. And they’re model agnostic, which means they’re constantly seeking out the best tool for the job — meaning you don’t have to.
2. Translations powered by GPT-4
As dull as it may seem, the humble translation tool can unlock massive potential for cross-border collaboration when language might otherwise be a barrier (though my earlier warning on sending sensitive information to cloud-based models still stands). Nothing replaces a skilled linguist (not yet), but ChatGPT running on GPT-4 makes a powerful sidekick. It frequently outperforms Google Translate in most language pairs and can be made more precise through simple prompting. You wouldn’t want to use this for a high-stakes application like an international treaty, but GPT-4 performs adequately for most applications. Beyond a translation, the large language model can further analyze sentiment and score words on their intended tone. In my testing, GPT-4 could pick up highly context-dependent subtleties and accurately provide a text’s negative, neutral, or positive determination. Going beyond a literal translation and placing that material into context are powerful tools for anyone trying to understand a foreign context. For example, when I fed into GPT-4 a number of Twitter user responses written in Urdu that criticized U.S. policy, it was surprisingly accurate in labeling comments by sentiment score. No small feat given the context the tweets were made in and the colloquial language employed.
3. Transcription at machine speed
Not every government press briefing is handily transcribed and ready for consumption, leaving the task of sifting through large volumes of audio and video to wary journalists and diplomats, lest a critical nugget of new information be lost in a sea of mundane updates. Thanks to new AI models, we can now transcribe everything from Youtube videos to audio in a fraction of the time it would take a human and at acceptable levels of accuracy. OpenAI’s Whisper model is open source, meaning anyone can download the software and run it offline—perfect for more sensitive applications like internal meetings where uploading a file to an unknown server would be out of the question. For less sensitive applications or public material, a host of services, such as Assembly AI, can quickly produce transcripts from a simple link or upload. Coupled with LLMs, what would have taken half a day of watching and summarizing press briefings can now be done in less than 30 minutes.
4. Personal AI
At its best, diplomacy is the intersection between the personal and the professional — the subtle art of finding common ground and creating partnerships even when they seem impossible. The latest large language model from Inflection seeks to mimic human conversation at a level beyond any other chatbot. Its chatbot, Pi, is designed to be a conversational partner, albeit one with access to all human knowledge. Still in Beta testing, the chatbot is available on IOS and Android and via a web browser. Its standout feature is voice interaction — say goodbye to typing and reading text. In my testing, I’ve found Pi to be an excellent coach and sparring partner, letting me spell out my arguments and giving constructive feedback.
5. Writing
As much as I hate to admit it, Grammarly, the LLM-powered editor, improves my overall prose. I like to think that years of deliberate practice and learning have ridden me of lazy writing habits, but we all need a good editor to help us with our blind spots. In my usage, Grammarly does a fair job of going beyond the basics and spotting text that can be streamlined. Similarly, feeding GPT-4 specific instructions on a piece of written text can help make adjustments to the overall flow of the prose.
We seem to be at an inflection point with AI systems. The zeitgeist seems to incongruously underestimate the power of new systems while hyping their potential. Yet with an understanding of how generative AI systems work, we can see that they have the ability to streamline workflows and refine existing systems, especially where the written word is involved.
Nonetheless, AI tools carry near-term risks if deployed irresponsibly. In the next piece in this series, we’ll cover what risks we face and how best to mitigate them.
Zed Tarar advises startups and holds an MBA from London Business School, where he specializes in the intersection of technology and policy. He has worked in five countries as a U.S. diplomat.
Disclaimer: While Zed Tarar is a U.S. diplomat, the views expressed here are his own and do not necessarily reflect those of the Department of State or the U.S. government.
For more on AI and diplomacy, check out some of Tarar’s previous articles: