Wading into the AI: Key questions to investigate

This post is part of ChurrPurr.ai, a challenge to design an online strategy game and the AI to master it. Play the latest version of the game here.

Steven Adler
3 min readOct 28, 2017

After a day or two off from the actual programming of Churr-Purr, I’m now ready to wade into the next phase of my challenge: Designing an AI to master the game.

It’s been a few years since my “Applied AI and Machine Learning” course at Brown, so I’m excited to get back into the weeds. [Recently I’ve had some forays back into the world of programming (see, Chess Personal Trainer), but those were more challenges in cobbling together different pieces I hadn’t used before (e.g., my first time programming in JavaScript) than they were in challenging conceptual design.]

Peter Norvig, a leading thinker in AI and one of the literal authors of the standard AI textbook, was a guest speaker for my class. Seemed like a very nice guy — thanks, Peter!

My strong first inclination for Churr-Purr is to use a reinforcement learning system to train the AI which moves tend to be associated with good future-states, under what conditions, and which moves should be avoided. By letting my AI ‘explore’ its ‘environment’ and see which sequences culminate in good outcomes (e.g., wins), I hope the AI can generalize to scenarios it may not have previously encountered.

My game is not yet ready to take inputs directly from a computer, and instead is still player-vs.-player. I’m going to set that aside from the time being, since the underlying mechanics are less interesting to me than the AI meat itself.

With that said, there are a few questions for me to investigate in making this AI system a reality. I’ve documented them below, and I’ll soon be off to think through their answers.

  • What good open-source reinforcement learning systems are out there? Will their use make my goal trivial (e.g., not really built myself), in which case I should modify the challenge?
  • For Churr-Purr’s complexity (possibly on the order of 10⁵⁰ branches), should I begin the training from somewhere deeper in the game, like stage 2 with each side having only 5 pieces, to seed an understanding of end-game dynamics? Will this artificially limit the AI to converging upon those particular end-games it understands best?
  • What data structures and interfaces for passing inputs/outputs to the AI will I need so this runs reliably on the web? (e.g., should I be mapping each state of the game to a certain best next-move in a hash table? That seems like a lot for a game this complex …)
  • Meanwhile, I’ll also need to sort through training the agent and storing its results so that I am not re-training the agent with each new game.

Perhaps at some point I’ll dig up the old code I wrote for my AI class, in which we implemented Q-learning and used it to help a simulated robot navigate a maze-like video game. At the end of the day, this challenge itself isn’t so different from the maze, in which one looks for an efficient unblocked route to the exit (aka, a victory).

Not the exact particular maze challenge we helped to tackle, but the same underlying concept: How can the robot learn to navigate walls and eventually end up at the good outcomes?

And that’s one of the cool things about reinforcement learning: How well it can generalize, once you get familiar with the aspects it requires.

Read the previous post. Read the next post.

Steven Adler is a former strategy consultant focused across AI, technology, and ethics.

If you want to follow along with Steven’s projects and writings, make sure to follow this Medium account. Learn more on LinkedIn.

--

--

Steven Adler

I work at the intersection of AI, ethics, and business strategy; thoughts are my own. www.linkedin.com/in/sjgadler