Build a chess app with Gemini 2.5 Pro
… and then play against it!
What is this about?
Last week, Google released Gemini 2.5 Pro. It’s topping benchmarks everywhere and is reportedly particularly good at coding and reasoning. So, what better way to put those claims to the test than by having it build a chess app, and then using that very app to play against the AI?
In this blog post, I’ll share my process of building a chess application using Gemini 2.5 Pro. While integrating it as a code assistant within an IDE is possible, I primarily used Google’s AI Studio for this project to demonstrate its straightforwardness. The core idea was to create a functional chess app that connects to an LLM, allowing a human user to play against the model.
I’ve published the resulting application on Github here: https://github.com/heiko-hotz/gemini-chess
Why should we care?
Since their inception, Large Language Models (LLMs) have been applied to coding tasks. It’s no secret that this is one of their primary use cases; I see it frequently with organizations I work with. Almost every company seems to have some form of LLM-integrated code development workflow these days. Testing Gemini 2.5 Pro on a practical coding project like this provides insights that go beyond standard benchmarks.
Furthermore, playing chess against LLMs has become a popular way to probe their reasoning capabilities and even for entertainment, as seen in videos like this one where a Youtuber pitted LLMs against each other in a tournament (Gemini didn’t fare too well back then ☹️).
This project touches upon the future of software development assisted by AI and offers a window into the reasoning abilities of current LLMs. That combination makes it a relevant exploration worth caring about.
Making a plan
First, I wanted to get a feel for what developing an application like this with Gemini would actually be like. My background is primarily in Python, and my frontend development knowledge is limited. Therefore, keeping the architecture simple was crucial so I could understand, read, and potentially debug the code myself.
My initial prompt aimed to collaborate with Gemini on a plan:
I want to build a chess app from scratch that connects with an LLM.
The user can use the mouse to play their moves and when it is the
computer's move a call to the LLM is made to get the next move.
I dont have much experience in frontend development and would want
to keep it as simple as possible. i'm very familiar with AI engineering
and the LLM i want to use has a python sdk which i know well how to use.
But if it's easier/simpler, we can also just use the REST API.
Suggest an architecture and tech stacks to build this.
I want to play it on the browser and simplicity and fewest line of code
are the most important to me.
Gemini suggested the follwoing:
We'll use a Python backend (Flask) to handle game logic, state management,
and LLM communication. The frontend will be primarily HTML, enhanced with
a minimal amount of JavaScript for the chessboard interface (chessboard.js)
and htmx to handle communication with the backend without writing complex
frontend JavaScript logic.
This sounded like a sensible plan. Since the frontend involved technologies I was less familiar with, I wanted to tackle that first to ensure it was viable before building the backend logic.
Below the suggested architecture:
Designing the UI
My next step was to ask Gemini to build a very basic, standalone web app. The goal was a functional chessboard UI where a user could make moves for both sides, without any backend or LLM connection yet.
Got it, lets stick with js + html.
Ok, let's get started. can you build the frontend/UI first?
I want to see it and also use it (without LLM, just every
move done by the user) to see how it works.
Tell me how to run it as well. I use vs code as my IDE.
Gemini responded with the necessary HTML, CSS, and JavaScript code, along with the following explanation and setup instructions:
Okay, let's build the static frontend first. This version will allow you
to play chess locally in your browser, making moves for both White and
Black, using the chessboard.js library for the UI and the chess.js library
for basic move validation on the frontend.
No backend or LLM interaction is included in this step.
Project Structure:
Create a folder for your project (e.g., chess-llm-app).
Inside that folder, create these three files:
1. index.html
2. style.css
3. script.js
It also suggested installing the VS Code Live Server Extension for easy local previewing in the browser.
Gemini explained the mechanics of this initial UI:
chessboard.js handles drawing the board, placing the pieces, and managing the drag-and-drop interactions. When a piece is dropped (onDrop function):
script.js captures the move (source and target squares).
It uses the chess.js library (running client-side) to validate if the move is legal (game.move(…)).
If legal, chess.js updates its internal game state. script.js then updates the displayed status (FEN, PGN).
chessboard.js visually updates the board position (board.position(game.fen())) to reflect the new state.
If illegal, chess.js returns null, and the piece snaps back to its original square.
Because the onDragStart function simply checks whose turn it is according to the local chess.js state, you can manually make moves for both White and Black in this version.
The first hiccup
Running the code generated by Gemini, the app loaded, but the pieces were missing:
Gemini suggested the cause could be temporary network issues, CDN problems, or browser extensions blocking external resources — all plausible explanations. It recommended downloading the piece images and serving them locally. This felt like a more robust solution anyway, especially for a deployable application.
However, the provided GitHub link for the images was incorrect (a 404 error). It pointed to …/img/chesspieces/wikipedia when the correct path was …/website/img/chesspieces/wikipedia. A quick manual correction and download later, the pieces appeared!
The backend
With the basic UI working, it was time to build the backend — the Flask application that would manage the game state and communicate with Gemini.
First, I wanted to confirm that the frontend structure could accommodate moves generated by an LLM, not just user drags.
Question: Will this UI work if the LLM responds back with a move instead of
dragging it? e.g. if I play e4 and the LLM responds with e5, how will we
make it so that the chessboard will execute e5?
Gemini’s explanation was clear and reassuring:
It confirmed we could programmatically update the board state based on the LLM’s response.
Next, Gemini guided me on structuring the project for the Flask backend, suggesting a static folder for the frontend assets and a main app.py file.
It also proposed an important architectural shift: Instead of the frontend’s script.js validating moves using chess.js, the frontend should send the user’s move to the Flask backend. The backend (app.py) would then become the single source of truth for the game state, validating the move, updating the board, and eventually, triggering the LLM call. This aligned perfectly with my goal of keeping the frontend simple and centralizing the logic.
The second hiccup
Before adding the LLM integration, I tested the connection between the updated UI and the basic Flask backend (which now handled move validation). Immediately, I hit a bug: after White made a move, trying to move a Black piece resulted in a “Not your turn” error, even though it was Black’s turn.
I fed the error message and a description back to Gemini. Initially, its suggestions didn’t resolve the issue. At one point, it generated a response that perfectly mirrored my own debugging frustrations:
“This is bizarre” indeed! Debugging can feel like that sometimes 😅
Gemini remained helpful, though:
After adding more logging based on its suggestions, Gemini finally pinpointed the issue and provided the correct fix.
Honestly, I didn’t dig too deep into the root cause at this stage. We were close to the finish line, and I was eager to connect the final piece: the Gemini integration for generating moves.
Connecting to Gemini
Integrating the Gemini call turned out to be relatively straightforward. The key was crafting a good prompt. To minimize the chance of the LLM hallucinating illegal moves, I decided to provide not only the current board state (in FEN notation) and move history but also an explicit list of all legal moves.
legal_moves_uci = [move.uci() for move in board.legal_moves]
legal_moves_str = " ".join(legal_moves_uci)
prompt = (
"You are a chess engine playing as Black.\n"
"The current board state in FEN notation is:\n"
f"{current_fen}\n"
"History of moves in UCI format: " + " ".join([m.uci() for m in board.move_stack]) + "\n"
f"LEGAL MOVES available: {legal_moves_str}\n" # Added legal moves
"Your task is to select the best legal move for Black from the list provided.\n"
"Respond *only* with the chosen move in UCI notation (e.g., 'g8f6', 'e7e5'). Do not add any other text."
)
print(f"LLM Prompt (example):\n{prompt}")
# Use the globally initialized client
if not client:
print("LLM Client is not available. Cannot generate move.")
llm_response_text = None # Indicate failure
else:
try:
response = client.models.generate_content(
model=model_id, contents=prompt
)
print(response.text)
llm_response_text = response.text # Ensure it starts as None if LLM call isn't active
except Exception as e:
print(f"Error during LLM API call: {e}")
llm_response_text = None
And that was essentially it for the core LLM integration. I initially used Gemini 2.0 Flash as the default model but later added a dropdown menu to the UI, allowing selection between different models, including Gemini 2.5 Pro.
Game time
With the app fully functional, it was time to play! I started a game against Gemini 2.0 Flash. As a beginner-to-intermediate player (around 1200 Rapid on chess.com), I felt I had a decent chance.
The game against 2.0 Flash was relatively easy. The model often missed simple threats, like hanging pieces and obvious checkmate threats.
Next, I switched the model to Gemini 2.5 Pro. The difference was immediately noticeable — the game felt much more challenging.
It wasn’t until the middle game before Gemini started to blundering a piece, and I was glad that it did — I don’t think I had a great position at that point 😅
Note: I stopped the video after Gemini blundered a few more pieces because response times got longer in the afternoon — presumably because of increased usage in the US (“Help, our TPUs are melting” 🤣). So you have to just believe me that I was able to convert this endgame 😉
One more feature
To get more insight into the LLM’s “thinking” process, I added one final feature: I modified the prompt to ask Gemini not just for a move, but also for its reasoning or thoughts about the current game state before providing the move. This proved very insightful, and occasionally, Gemini displayed a bit of personality:
You can find this version here: https://github.com/heiko-hotz/gemini-chess/tree/feat/thoughts
Conclusion
Building this chess app with Gemini 2.5 Pro was a revealing experiment in AI-assisted development. Despite starting with limited frontend expertise, I could rely heavily on Gemini for architectural suggestions, code generation, and even tricky debugging sessions — complete with shared “bizarre” moments!
The resulting application, which allows direct play against the LLM, is a concrete outcome. Perhaps more importantly, the process highlighted how AI tools like Gemini can significantly lower barriers, empowering developers to tackle projects outside their immediate comfort zone and potentially bridging skill gaps. This experience offers a glimpse into how software creation might evolve in the near future.