Abusing my computer science knowledge to cheat at Catan

8 min readDec 25, 2020

I have recently been playing a lot of Settlers of Catan over on colonist.io. However, when playing competitions at a high level, you need to keep track of a lot of information. What cards do people have in their hands? What dice numbers have not been rolled much? What dev cards are left in the deck? You can calculate all these by hand with a piece of paper, but I’m lazy and it’s easy to make mistakes.

So I wrote a script to keep track of my hand cards. I use a Greasemonkey script to send a copy of every websocket message to a local python server. The server finds events that lead to a card entering or leaving a players hand, then display continuously what cards a player has. It looks like this:

The communication (at that time) was plain JSON with descriptive names, so it wasn’t that hard to reverse engineer all the messages that involved the transfer of cards. I played several games against computers and recorded all communication, and later analyzed all the messages to find the ID’s of the significant types.

Handling stealing with the robber

While simply adding and subtracting a players active cards worked for most types of transactions, there is one significant exception: Stealing a card with the robber. In this case, the only information sent over the wire is that a card is stolen, not which one.

In my initial version, I just had a 6th card type for “unknown”, shown as a card back, and simply add and subtract as usual, so it looked a bit like this:

However, this made it a bit hard to reason. What does it mean to have -1 unknown cards? If red then draws a card from green it would show as 0 unknown cards, but that is inaccurate. This possibility made it very difficult to make good assumptions during games unless you had been paying careful attention. However, the entire point of the script was to not have to pay careful attention.

Second attempt: Guarantees

In order to play the game, the tool needs to have accurate actionable information. I decided that less data is well worth more reliable guarantees. Thus, I developed a new algorithm. The logic for stealing a card looked a bit like this:

when stealing a card from player a:
let n be the total number of unique resources player a has at least one of.
For every resource player a has at least one:
Subtract one from that resource.
Add n-1 "unknown" type resources to a

The player that steals receives one unknown resource like usual. Additionally, when a player discards a resource he does not have, he discards a unknown resource instead, since at least one unknown resource must be the one discarded.

This system worked well and I was able to make many more informed decisions while playing, though I still rarely won. I knew that if my second screen showed a player had one wheat, he must have at least one wheat. No more negative unknown resources. However, I was throwing away a lot of potentially useful information, particularly in the end game where this information is the most valuable.

Third attempt: Extracting more information

Consider this case: Green has one wheat one brick. Blue starts with no cards, takes one card from green, then discards a wheat. In this case, we can know for sure that green must have the brick, since the wheat was discarded. Would it be possible to write a algorithm to intelligently “make known” the unknowns when future evidence comes to light.

I tried to google this problem, but could not find any search terms that could give me a decent algorithm. I would have to find my own.

I spent many weeks thinking about this problem before even attempting a implementation. First, I thought I might for each card keep track of a set of places it could possibly be, then if that set was ever reduced to one player, I could definitively assign the resource, reducing my uncertainty. However, I found that calculating these sets in was quite hard in non-trivial cases, and lead to a explosion of possibilities. Suppose you have one wheat that could belong to red or green and another that could belong the green and blue, then pink takes one card from green, now both cards could also be pink’s. In practice, the sets tended to grow rather than shrink over time, never reducing enough to draw a conclusion.

After many more failed attempts, I had a new idea. I would store the following data:

For every player: How many of each resource they definitely have, and how many they might have.
For every resource: How many are floating, as in they don’t belong to the “definite” set of any player.

I can visualize this data like this:

In this case, I know that red has a wood and 2 wheat, and one more card that can either be a what and an ore. We also know from the extra floating resources column that there is only one wheat in total “missing.”, which might belong to red, blue, or orange.

When adding a resource, we can always add the resource to the definite columns. When subtracting, if the resource exists in the definite columns, we can subtract it. Else, we can remove one from the possible values of the resource, and subtract one from the floating resources. Thus this reduces the uncertainty by one.

When transferring a resource, the logic is similar. Simply make one of every resource the giving player has to a unknown resource, while adding the same to the receiving players unknown resources. Also add one of each the giving players unknown resources to the receiving players unknown resources.

Based on this data, I can run a number of simplifications:

If the total amount of spaces a resource can be equals the amount of floating resources of that type, you can know for sure all these spaces are definitely the resource.
If a player has a space where only one resource fits, it must definitely be that resource.
If a player has at most N resources of a given type, but only M total unknown resources, set N to min(M,N)
If a player has at most N resources of a given type, but there re only M total floating resources of that type, set N to min(M,N)

I will explain these in more detail. Note the examples are not all actually reachable from the initial state, and in some cases other rules besides the main one may also be applied.

Rule 1: When the total amount of spaces a resource can be equals the amount of floating resources of that type

In this case, we can see there are two slots where wood can be, and two wood missing a place. In this case we reduce it to two real wood.

Rule 2: When only one type of resource fits in a spot, that resource must be filled in the spot

This case is very simple, red’s missing resource can only be wood. Thus it must be wood.

Rule 3: You can’t have more slots than the total number of floating resources of that type

This is the state we would get after applying rule 2. However, now you notice that blue “could” have one lumber according to our table. However, there is no lumber in the floating resources. Thus blue cannot have any wood.

Rule 4: You can’t have more slots for a specific resource than total slots

In this case, blue can have up to 2 brick, but has only one total unknown cards. Thus, he may have only one brick and sheep.

These rules can usually reduce the problem in cases where I can do it manually to. However, I managed to find some more rare reductions. Suppose if red has 3 unknown resources, all of which could be brick or sheep, green has the same, but there is 4 floating brick, I know both red and green have one. I can do this because there is a limited amount of holes, or places the resource could not be. Holes are defined as the difference between the sum of all places the resource can be; and the total amount of variables of that type that float around. If the amount of holes is less than the amount of resources of that type that may belong to a given player, that player must have max_amount - holes resources of that type, since there are not enough holes where the resource could not be.

Conclusion

I am sure that many other cases where you could simplify the graph and learn some information. However, this set if 4 reductions seems to be enough to reduce unknowns sufficiently fast in real-world games. I learned a lot in this project, both in writing user scripts, servers, reverse engineering, and mostly writing algorithms for never-before solved problems. I would encourage everybody to give this problem, or a similar one, a go.

The code of the logic algorithm can be found here. Disclaimer: It’s not neat at all, and badly documented. If you don’t understand the algorithms from this text you will understand the implementation even less. I’m also keeping the rest of the code to extract the card transfer events from the data stream secret, if anyone else wants to cheat, they should at least put in the effort to reverse engineer the protocol themselves.