Human-AI Interactions Study: World Chess Championships
GG Ashbrook 2023.04.27
The main idea here is to look at the 2023 FIDE World Chess Championships as a rich resource for analyzing aspects of human-AI interaction, specifically how the AI tool is used by, and how it affects, the commentators.
Yet again chess is a wonderful sandbox for studying AI. 2023 was the first year that an AI-engine evaluation-bar (or ‘eval-bar’) was available in real time for the Chess.com panel of Chess Grand Master commentators who cover the FIDE World Chess Championship. We can use this AI + Human analyzed FIDE World Chess Championship as a case study for looking at factors and issues in how AI and H.sapiens-humans collaborate.
The length of each game can vary, but there are generally hours of commentary per game involving human use of AI tools including. This includes commentary by the commentators about the AI tools they are using, discussing what has been helpful or confusing. There is also peripheral material including a traditional press conference and question session after each match, wonderfully managed in 2023 by Woman-Grandmaster Keti Tsatsalashvili. And it is largely from these after-game Q&A sessions that we hear all that we do from the players themselves.
Having comments from the players can be important, as the players do not have any input from the AI models, and divergence between the no-AI-input (players) point of view and the with-AI-input-and influence point of view (commentators) is a key topic: How does the use of AI influence human perception and action (for better, worse, or arbitrarily)?
There is also commentary from other Grandmasters available online, such as Hikaru Nakamura (who was next in line to be in the finals after Ian and Ding), where he gives yet more commentary and analysis of moves, possible moves, and the performance of the AI ‘eval-bar.’ One of the excellent services that Mr. Nakamura provides is on-board analysis of comments made by the players, as there is as yet no board to show the moves (and possible moves) that the players discuss.
Up or Down
In this case we are looking at a very minimal interface between AI and H.sapiens-humans. There is a single linear black or white bar along the left hand side of the chess-board. To liken this to something most people have experience with: it is like a progress bar. The bar can be read as the white side’s white-progress bar towards winning the game, or from the black side’s point of view: a black progress bar filling with black towards winning the game for black. At the beginning of the game the bar starts out half black, half white: equal chance of either player winning.
This minimal AI interface can be useful (perhaps too powerful in some cases), but the ‘lack of dimensionality’ and lack of information for interpreting what the AI is saying can be problematic or confusing and stressful where the H.sapiens-human does not know how to interpret what exactly the AI is saying.
Dimensionality of Interface
Dimensionality is a huge set of sets of topics in AI, Machine Learning, and Data Science, but here our focus is not dimensionality in the modeling process, but “dimensionality” for the interface (UX/UI) between the AI and H.sapiens-humans. For example, the ‘eval-bar’ (the evaluation-bar, the AI-interface) moves; the white-progress-bar gets longer: What does this mean? Presumably it means something good about white’s position. But what does it mean more specifically? Is it always clear? Is it always right? Is it always verifiable?
Very frequently, probably a dozen times per game, one or more commentators will say something to the effect of: “The bar says white is stronger, but I don’t see that at all.” or “The eval-bar says black’s position is weak, but if I were just looking at it I would say it looks obviously stronger. I’d much rather be playing black here. I have no idea what that eval bar is talking about.”
There are a few situations where the commentators try, sometimes with a humorous lack of luck and subsequent surprised bafflement, (they try) but cannot find what the move combination it is that the AI eval-bar says is so strong. As a fake example: Let’s say Black makes a move, the Eval-Bar (win-o-meter) swings strongly to black progress towards winning. Then the commentators excitedly say: “Ah, yes, this was a great move on black’s part, because if they move the…” And they proceed to try testing out next-moves…but everything they test reverses the progress. Eventually the commentator gives up and moves on with the ongoing game. It’s possible it was a bug in the model, but likely sometimes it is the AI-model finding some obscure counterintuitive set of moves, or perhaps moves too dangerous for a human to want to risk. It would be interesting to do a more detailed study of this and the effects on the user.
Though the empiricism of chess, and the concrete falsifiability of bad chess claims, may temper it, there may still be some notable bubble/echo-chamber effect, especially in games where the players-too see the eval-bar(AI-interface). It is likely that there is an influence by the AI-interface (either the medium or the message…if there is a difference…) on the shape of the human narrative in the commentary, though the commentators are grandmasters who know their way around chess details. When the progress-bar is low, the story is about the underdog. When the progress-bar is high for one color the story is about that color’s inexorable momentum towards victory! Or when the progress bar is dead-center during the whole game…the narrative is about how neigher player can pull ahead! But how much is that human-story being influenced by a few pixels, which the commentators often say they disagree with anyway? If this is not the players, that is one thing. But what if a game is influenced by the players seeing what the bar says and believing it? (or ‘gaming the bar’ and playing positions known to not move the bar so the other player wont suspect a strategy?)
The Grand Canyon Edge Walk Effect
If a person is walking along the edge of the grand canyon, and all the AI is look at is how stable the rock under their feet is, a person can be walking up to the edge of the canyon and until the last step the AI will say there is zero chance of falling, which then jumps to 100% as the person steps over the edge. The chess AI is not looking at:
1. general body language
2. physiological signs of problems
3. Scheduling: how much time is left to play a position, or left per move for future moves.
4. each player’s strategy
5. the player’s style of play
Examples (if only for story-illustration value)
Example 1. In game 12: Ian was making fast reckless moves and, like walking along the edge of the grand canyon, everything is fine as long as you don’t make a mistake. But as soon as he made a mistake, the ‘eval bar’ which up until then said: ‘Ian will win!’ suddenly dropped to ‘Ian will loose!’ then at the last minute he resigned.
Examples 2. In game ~7, Ding was playing well but running out of time. Everything is fine until you run out of time (like walking close the edge of the grand canyon). So the eval bar for most of the game said ‘Ding is winning!’ until he ran out of time and asked for a draw.
Parroting the AI
In the past for world chess championship games there was a sharing ideas aspect of humans all over the world instant messaging ideas and comments into a ‘live chat’ along with the world chess commentary (probably possible since…the 1990's). But as GM Fabiano Caruana mentioned in game 12, paraphrasing here: “We all know where these suggestions in live chat are coming from this year [people are just suggesting moves that the chess AI says are good].”
Chess-AI as unusual and single-purpose Idea
On the one hand chess is a perennial example of “a special case” where chess-AI tends to be useless for anything else. A fascinating twist in AI, is that from the 1940’s up until Big Blue people assumed it would take a GPT4 type AI with world knowledge and common sense to be able to play chess well, yet Big Blue is (and likely other chess ‘engines’ are) so different from the standard categories of AI that it barely even fits along with later standard AI types. (And there are many interesting lesson to be learned from big blue that do apply to AI more generally, such as portability and integrating vision and motor control etc.)
On the other hand, there will likely perpetually be two different areas or directions of AI and AI-group/team or AI-H.sapiens-human interactions (which may become more extremely polarized over time as technologies improve):
1. big more general(non-specialized) AI models (such as Generative Pretrained Transformer large language models, as OpenAI has done such pioneering work with).
2. narrow-specialist AI that produces very, very, context specific output.
In other words, Chess-AI (or chess-AI-interfaces) may be a good example of the general category of portable single-purpose project-specific AI that teams are likely to use as part of “smaller” tools, which may be for various reasons including resource-efficiency needs, or that they were made recently and locally for one project (not made over many years by huge organizations), or perhaps it is just a very specific function that has no obvious need for a model that tries to do more than one thing well. There is also perhaps the standard “generalization vs. production-development” context, where having a small, predictable, efficient, fast, easily maintained, reliable, tool that does what it needs to do can be far better than a bloated, unstable, expensive, unreliable, system that tries to do many extraneous and unnecessary tasks (and other issues such as security etc.).
To summarize: likely many groups will be using very-narrow-AI tools like the ultra-minimal ‘eval-bar’ as seen in 2023 FIDE WCC+stockfish-model, and there are interesting issues and likely training and best practice about how to do that.
There may or may not be general (or branching into discipline-specific) workflows and best practice for what features, factors, contexts, and ‘dimensions’ an AI interface should have.
False Positives and False Negatives
It would be interesting to do a more detailed analysis (comparing models and experts and clear examples of what good or bad things could happen in different board configurations (where the chess pieces are), and to compare that with the performance of the Stockfish-Linear-AI black-box ‘win-o-meter.’ Specifically, count the false positives and false negatives and what was happening in those situations.
Sometimes Alone, Sometimes Integrated
Even though chess is notoriously not-applicable (or not ‘generalizable’?) to other situations, he we are looking at the use and interaction of the AI-Human collaboration of the expert commentary panelists, which is likely very generalizable to many teams working on projects using AI (or where AI is a participant on that project).
A possible side branch topic of of this maybe more specific to chess but also perhaps with broader uses, is how AI-Interfaces are used in training by top chess players, and more generally used by novices on platforms such as chess.com, which provides analysis tools that people studying chess in the past would rarely have had access to. Either for gamified learning, or the effects to tools on learning, or uses of tools during projects, chess likely
And to close with a beautiful turn of phrase attributed to Ding Liren reminiscent of the astounding depths of high-dimensional meaning: “It’s still some dark ocean kind of position, so I didn’t go further into it.”
Possible AI-UI Dimensions
1. Confidence in outcomes (false positive false negative)
2. Fragility of Situation
3. Dependency on delicate tactics
4. A lack of shared assumptions and ‘common sense’
5. Depth: The unknown Kasparov-Event-Horizon of how far the AI is looking strategically, not just tactically
6. Player style
7. Schedule factors (remaining time, time per move, etc.)
8. Specific Assumptions
9. Density of Option Forking
10. Unpredictability Index
Chess.com panel commentary on matches: https://www.youtube.com/@chess/streams
Note: This topic has some connection to gamification in AI frameworks.
About The Series
This mini-article is part of a series (where chess is a theme) to support clear discussions about Artificial Intelligence (AI-ML). A more in-depth discussion and framework proposal is available in this github repo: