Project Reports: The Royal Game of Ur, Program Design (2/2)
This is my try on simulating Royal Game of Ur and holding an AI Tournament of this game before my freshman year at UC Berkeley.
In Project Reports, I share as a UC Berkeley CS student about things I DIYed to talk about what Computer Science students do and how we learn about designing programs in a way that people can understand even if they are not a CS major.
See first part here.
How the Computer Runs My Game
When coding, programmers are allowed to code up and simulate real-life objects in their program via telling the computer what are the rules and properties of real-life objects. The computer receives special code written for simulating an object, and builds these simulated objects for the programmer. This principle of programming a system with custom objects is called “Object Oriented Programming” (OOP), and is already quite popular across the applications you use on a daily basis, since a very long time ago. In this case, I will simulate in-game aspects like “pieces”, “game board”, “players”, and “on-board cells” as custom objects in my program, which would make sense as each of these objects have their respective behaviors to simulate (moving across board, allowing rerolls… etc.).
Using objects in every single situation, however, comes with a con: memory storage. At minimum, simulated objects in a computer have properties that represent them; for example, dog objects will have “names” and “ages” in their computer versions just as in real life. In computers, however, words like names will take quite a bit of memory to store, and numbers like ages might also do. The dog object itself costs at least the memories of both a list of characters representing name and a number representing age. This implies that, fundamentally, objects are some kind of container of values that characterize them.
A dog object sounds reasonable to implement, but there are also things that could be structured simply without the need of using objects. Think of this in the following real life scenario:
I am making a big bag of my favorite snack, deep fried onions. Either packing fried onions into numerous small packs and then putting them into the big pack, or directly splashing them into the big pack, completes the purpose of giving the onions a container. But ultimately, the small bag approach requires more cost. Which packaging is more effective then?
Back to the computer science world! To clarify the location of code for each object and create shorter files, I wrote code to let my computer simulate each type of in-game objects across their respective files and have the relationship between each objects looks like this:
Some simpler objects to describe first:
The Player object: simply contains a name for the player and a strategy they carry.
The Piece object: has a numeric ID for identifying which piece it is out of all that its owners own. It also contains a number indicating the ID of the player that owns the piece.
The Cell object: may or may not contain a piece, and follows a code to identify its position on the gameboard as well as whether it yields special abilities for the cells on it.
The Board object is slightly more complicated. It is a container of Cell objects (because boards contain cells). But, in a larger picture, it is a “container of containers of pieces.”
While my computer is able to track all the containers that happen in a game, this design seems a bit redundant. Here is where we go back to the bags of fried onion analogy. Cells are the small bags, game pieces are the onions, and the board is the big bag. Why do I not find a way to just let my Board contain my game pieces, instead of building a container of cells which are containers of pieces? We at least see an improv-able spot in my indecent code now! But nevertheless, an object oriented design like this is a bit more instinctive for the developer than the aforementioned alternative, since each cell has their own special effects (rerolling, peace mode…) as well.
The Game object is responsible for manipulating data on the board, across the cells, and for computing heuristic values of each move made by players.
Last but not least, our in-game AI algorithm, inspired by the expectiminimax, is approximately structured as follows:
Heuristic:
f(board) = distance current cell makes + distance of deported cell from its start, if any.
Algorithm Pseudocode:
util = 0
for i from 1 to n:
calculate the current util according to f(board)
if the current predicting player is myself:
util += f(board)
// by making a move I am expanding my own advantage
else if the current predicting player is my opponent:
util -= f(board)
// because my opponent makes a move that reduce my advantage
alternate the current player to the opposite player
return util
Or, in plain English:
Synthesis
Now we will simulate the game by providing the Game object two Player objects, a Board object with several Cells and Pieces on board. A game is actually done quite fast, and if a game exceeds 500 turns, it would be automatically decided as stalemate. Delegation of responsibilities across each object in maintaining a game is also smooth.
Then, we’ll just have to decide a procedure to test different strategies against each other. For strategy A and strategy B, respectively conveyed by virtual players A and B, we will let A and B go respectively first and second, and record the win-rate of each strategy matchup. Distinct pairs of strategies will have a matchup for each possible order in which players go. If a strategy goes against itself, there will only be one order of players tested for such matchup. We will test two different strategies: one being simplistic, and the other being the predictive algorithm we discussed before.
The strategies we have in place are a simple strategy that just moves the piece furthest from the start, which we will just call Simple, and the predictive algorithm we’ve built along the way, called Predict, with versions of it using thinking depths ranging from 1 to 6.
For each strategy, we will match them against all strategies we prepared for 200 games under the condition of going first. That way, we can observe the strategy’s performance both when going first and when going second.
So the flow of tournament would go in some flow like this:
Matchup 1: Simple vs Simple
Matchup 2: Simple vs Predict Depth 1
…
Matchup 8: Predict Depth 1 vs Simple
…
Matchup 49: Predict Depth 6 vs Predict Depth 6
Results
We can then roughly make two conclusions from these results:
- In the long run, going first or second makes a very slight difference for the strategies designed in this experiment.
- For the predictive strategy, the higher the designated depth of prediction (or complexity of thinking), the lower its performance against other strategies and the worse it fares against the simpler strategies of this experiment.
Possible Revisions
Here are a few major questions that I’d like to discuss regarding improvements of the project.
Q1: Why did the strategy work worse and worse per depth of prediction? Or, what parts exactly are the “bad thinking” in the strategy function?
The devil is in the details. In the flowchart of the algorithm’s decision procedure, you might have noticed that when simulating the outcome of a board after a dice roll, we assumed the best outcome of the board for both us and the enemy, the one that yields the most util. This made the algorithm too optimistic, and the higher the depth of thinking, the larger the effects of each error expand. It is unreasonable to decide the next move of a board solely based on what might be the best situation for the gameboard.
Furthermore, albeit having taken inspiration from Expectiminimax, essential aspects of that algorithm were not even adapted. For one, the alpha-beta pruning that decisively optimizes the game tree operations where we seek the best move. For another one, is the ability to consider the outcome of a dice roll by its average case of yielding utils, rather than the most optimistic case.
Other than that, my heuristic function might have been too simplistic, since it evaluates the advantage of a game board solely by the amount of difference in steps it can create. To elaborate this heuristics, we can reward other beneficial actions further, such as: arriving at the end, arriving at a specific cell of special effect, capturing an enemy piece…
Q2: Why did the simulations cost so much resources for my hardware?
Every single move of the game creates a very big game tree, considering many possible boards, and each board used for simulation also copies every single cell and piece it used to carry. This means the algorithm needs to make many, many simulations to end a game. With a simple approximation, for a game that lasts 150 turns, with an algorithm that thinks 3 steps ahead, and provided each turn has 5 possible outcomes (because there are 5 possible dice values), this already indicates a minimum of over 2000 simulations per game.
In a simulation, while imagining board outcomes, we need to duplicate every single thing on them, and these simulations persist through at most 500 turns for the worst case. The object oriented structure of the gameboard makes simulation costly in resource, hunting for an improvement in its design.
This again calls back to the fried onion snack analogy: bad designs cause extra cost, while concise designs waive them.
Q3: How can we make the strategy function perform better?
As mentioned before, the strategy function should be more conservative in its guess and perform a more delicate evaluation for each move. That would hint at the following directions:
- Start considering the expected value of a dice roll’s outcome rather than the most optimistic or pessimistic outcome.
- Start considering rewarding some actions more than others.
These adjustments to the heuristic function would require us to, sometimes, look at the development of individual gameplays and manually (or by human eye) identify key features of improvements. The expected value fix, meanwhile, is simple to carry and probably only requires changing a few lines of code.
There is also an interesting direction to go with, where we write some sort of algorithm for the program to learn what combination of rewards is the best for gameplay. To do that, we need to decide an initial combination of rewards for a hand-selected set of in-game actions, as well as an algorithm for the program to assess different combinations of rewards and decide whether rewards should increase or decrease for each action. Now it does sound like something we imagine AIs doing in sci-fi pieces or news articles:
the use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw inferences from patterns in data.
Unfortunately, due to time constraints and daily workload of the coming semester, the project’s development was halted here.
Non-Technical Afternotes
Overall, the project might not be too bad of an attempt for someone who has done only approximately the work of an APCS course, but even if it is, it was a great indicator for the space of improvement.
Albeit this first side project is a rather unimpressive first step, it ignited my interest towards extracurricular coding activities, and taught me that programming is really a curious process, full of intentious decisions like when playing a classical piece. Furthermore, for any artist, the first step to any great work is to recognize its former defects and transform them into something more elegant. So is it for beginning programmers.
If you have any thoughts on this project or corrections, questions about this report, feel free to comment below!
Github repository for code: https://github.com/Bransthre/project-ur