My AI is live — and it’s about as good as I am

This post is part of, a challenge to design an online strategy game and the AI to master it. Play the latest version of the game here.

I’m proud to say that my AI is now live, and it plays a pretty capable game of Churr-Purr — if I do say so myself.

Its gameplay is far from perfect, and a person who understands how it plays could likely exploit its tendencies over time. That said, it’s also a far cry from randomness in what began as a relatively complex game.

I should caveat, however: This AI is not the end-state system I’ve begun building with deep reinforcement learning. Instead, I’ve built a baseline AI that later iterations can play against as a yardstick of their progress.

Allow me to explain …

For my yardstick AI, I’ve taken the approach of defining relevant rules-of-thumbs for our AI to follow.

These people seem pretty excited for the launch of Churr-Purr’s AI.

For instance, when the AI detects immediate opportunity to create 3-in-a-row, it should do so. Similarly, it should prevent its opponent’s 3-in-a-rows, because those have power to undo whatever move our AI may play instead.

Accordingly, the AI plays about as well as I do, as its gameplay is based around the rules of thumb I too follow when deciding what move to make.

The AI still ‘knows’ about the state of the game, what actions are permissible, and so on, but its behaviors are driven by a set of features that I’ve defined, rather than features it’s discovered on its own.

What does that mean? Let’s consider the two systems:

AI feature definition

I can teach my AI about how to test the game for certain features and tailor its behavior accordingly — for instance, how to detect duos threatening to become trios, and then block them. This approach comes with some pros and cons:

  • Pros: The AI is fairly interpretable; by walking through the code, you can trace what decisions happen and why. The AI can also start off with a relatively robust amount of ‘knowledge’, rather than needing to accrue it through experience.
  • Cons: The AI is limited to the quality of the programmer — it won’t learn over time or discover novel methods on its own. If the programmer’s rules of thumb are misguided, the AI will be stuck at a bad baseline without means of improving. Additionally, directing important features for an AI quickly becomes unwieldy — and it isn’t very fun to write a truly comprehensive feature set.
I wouldn’t have much faith in an AI that came from my attempts to define features of particular dog breeds … Might be best to leave it to the experts (or the AIs themselves).

AI feature discovery

An AI without my direct input can also ‘learn’ about duos threatening to become trios, but in a different manner.

For instance, a reinforcement learning agent may take 1,000s of iterations to discover what this pattern looks like, how to counter it, that not countering it leads to loss of points, and so on. (This is all information I could endow my feature-defined AI with from the outset.)

Some examples of duos that the AI may eventually discover — or that a programmer can help an AI to detect upfront.

For AI using neural networks (NN), this learning happens through a series of value updates to a group of mathematical weights. At the end of this process, there most likely won’t be a certain weight that corresponds to ‘yes the NN detects a duo’ vs. ‘no the NN does not’ — but in aggregate, the decisions will be informed by this detection, albeit in hard-to-pin-down ways (this is what researchers are alluding to when they suggest neural networks are black boxes). (For a good introduction to neural networks, albeit with some technical complexity, check out this article.)

This is a fairly standard image of a small neural network. To simplify things, imagine each circle in the input layer is a number; each of these numbers could be passed through several mathematical transformations such that they in aggregate produce an output. For instance, each number on the left might be multiplied by 3 (to get to HL1) and then squared (to get to HL2). An output rule could, for instance, sum all the numbers at HL2 and produce a certain decision if the sum is odd vs. if it is even. Generally, however, transformations and output rules will be a bit more opaque and complex.

The feature-discovery approach similarly has pros and cons:

  • Pros: The AI can discover novel associations unknown to humans and reach higher ceilings of learning; it can also buck convention, such as AlphaGo’s notorious “move 37” that puzzled commentators and expert Go players. Additionally, the AI can learn complex features that are difficult to articulate directly.
  • Cons: The AI requires time to learn; in a poorly-designed learning environment, it may never fully learn. Additionally, when learning is complete, developers will not necessarily be able to explain reasons for the AI’s actions.
Google tells me that this is an image of AlphaGo’s infamous match following move 37. I’d love to explain to you the brilliance behind AlphaGo’s deviation, but if I could do that, I might have a world championship to go win.

So, which method is better? And does my take on the first method, in which rules of thumb are defined, even count as AI? Good questions.

To the first question — it largely depends what your objectives are.

For instance, if my AI needs to operate under tight time constraints, the feature-defined method will get to competency far more quickly than the feature-discovery method.

Additionally, if you care about knocking out a large percentage of a problem but not necessarily becoming state-of-the-art, feature definition can go a long ways: Spam filters that treat certain notorious words with suspicion aren’t perfect, and there’s certainly risk of false positives — but they also do a pretty good job of rooting out spam. For my baseline AI, which isn’t meant to become Churr-Purr world champion, I think its rules of thumb are enough.

Another challenge of defined feature AIs: They can be prone to game-playing by those who want to avoid certain classifications …

To the second question, of whether this is AI — I think the answer is a bit more complicated, but ultimately, I lean yes.

I see a few possible objections to whether this method is AI:

Its moves are somewhat pre-determined— True, the AI certainly isn’t as freeform as one starting with no conception of the game, but it’s also a large improvement over “good old-fashioned AI”. Instead of spelling out state-action pairs in painstaking detail, we can greatly reduce the directions needed by training the AI to look for themes.

It doesn’t really ‘know’ about the game / hasn’t ‘mastered’ it— This is an interesting objection but may also be anthropomorphizing the AI a bit. I think a big component of AI is whether the software achieves its intended goals in a way that requires knowledge.

For instance, software that automates 98% of routine accounting tasks might not be said to have ‘mastered’ the art of accounting, but it’s also pretty clearly a software capable of engaging in accounting. Meanwhile, if I were to bet on who could beat 100 people in simultaneous play, I think my AI is far more capable than I am; it’s less prone to misapplying rules, forgetting the state of the game, and many other mistakes I would surely make.

It doesn’t figure anything out on its own / it doesn’t learn over time — I think this is the strongest of the objections, but ultimately, I’m not convinced these are essential components of an AI. Instead, I believe these arguments somewhat fault the AI for not being conscious.

Consider, for instance, that humans also get knowledge from external sources (books, other people). Certainly most people update their beliefs in light of new feedback, but would we stop considering someone as an intelligent agent if they instead stuck with the same habits and routines? I doubt it. This AI would not be generalizable to other problems or particularly ‘strong’, but it does have some domain expertise. It cannot ‘know’ in the sense of a human, but when judged by its actions, the AI does seem to behave ‘intelligently’.

Underneath many of these objections is a sort of Chinese room problem in which one gets into very philosophical discussions of what it means to know something, and whether action without understanding truly counts as knowledge.

While these considerations are fun — and also potentially impactful, as they could be one day part of defining AI’s status in society — they don’t need to be resolved to enjoy the updated functionality of Churr-Purr’s AI.

I encourage you to give some thought to how you define intelligence and knowledge in your own life, and I also encourage you to check out the AI firsthand. Let me know what you think of its gameplay; I hear it’s got a few tricks up its sleeve …

Read the previous post.

Steven Adler is a strategy consultant focused across AI, technology, and healthcare. Learn more on LinkedIn.

If you want to follow along with Steven’s project, make sure to follow this Medium account.