Going backwards to go forwards

This post is part of ChurrPurr.ai, a challenge to design an online strategy game and the AI to master it. Play the latest version of the game here.

Steven Adler
4 min readNov 7, 2017

Over the weekend, I dug in earnestly on my AI’s implementation. The good news? I’m pretty sure I’m getting somewhere. The less-good news? It required taking a few steps back in the process. Let me explain …

After reading through various online tutorials, parsing example AI libraries on Github (a big online repository of code), and successfully installing both Google’s TensorFlow and OpenAI’s gym on my computer (two major components of mainstream AI programming today), I finally (re)arrived at a critical part of my solution: REINFORCEjs, a JavaScript library created by Tesla/Stanford/OpenAI’s Andrej Karpathy.

Andrej has developed a library of functions that help AIs navigate ‘grid-worlds’ like the one pictured above, among other AI tasks

I’m fairly sure I came across REINFORCEjs earlier in my search (and for some reason overlooked it), 1) because it’s two of the top three searches for “reinforcement learning javascript” on Google, and 2) because Andrej is somewhat of a celebrity in the AI field.

Whoops. But glad to have found my way back. (Prior to re-finding REINFORCEjs, I spent a significant amount of time reading through Arthur Juliani’s excellent tutorials on TensorFlow and various forms of reinforcement learning — thanks, Arthur!)

Going forwards

A library is a compilation of different functions, variables, etc., that can be borrowed from for other programs. For instance, programmers can often borrow from a Math library to implement basic math functions rather than defining the functions themselves.

Having REINFORCEjs in-hand will allow me to focus on the particulars of my challenge and aspects like knowing which actions are valid, the rewards associated with them, etc., rather than having to implement the basic mathematical fundamentals.

In some sense, implementing the underlying math would have been useful to truly ensure I understand the AI’s process — but I’m fairly line-by-line in the code as is, trying to get it to play nicely with my existing code. So I don’t mind too much not having to translate arrays to matrices to columns, and so on.

Going backwards

While REINFORCEjs is in many ways a huge boon to my project, it also interacts with inputs/outputs a bit differently than I’d anticipated in designing my game, and accordingly I’ve had to reconfigure a few elements:

  • Whereas descriptions of states were strings before (e.g., ‘1001201200 … ‘), they now need to be arrays (e.g., [1,0,0,1,2,0,1,2,0,0]).
  • Action sets are also arrays of possible individual actions, which makes sense, but didn’t sit well with how I’d (probably unwisely) designed my actions initially — a duo of [actionToTake, actionOrigin (if relevant)].

The first change was relatively simple to implement, but also makes my code less clean than I’d like, and it’s probably a lever for future efficiency gains. Straightforward enough though.

The second change was more involved, and unfortunately it required backtracking on some design choices I’d made earlier on (namely, that I wanted the computer to be able to skip over ‘activation’ and go directly to ‘sliding’ — since it doesn’t make sense to activate a cell you won’t then immediately slide).

Accordingly, I had to rework my state descriptions to have a new move status (‘a’ for activation, joining ‘m’ ‘r’ and ‘s’) and add an additional value at string’s end to represent the activated square, if any. This also required reworking the functions by which states are generated; determining what actions are valid at a state; and the next state produced by the action.

It also changed up the form of actions and consequently how they were passed to different functions, with some knock-on effects. All in all, this was a lot of refactoring, but the systems now work again.

Going forwards again

This was also an insightful lesson to be more mindful about design and how the idea of technical debt evolves in companies (i.e., short-term long-term tradeoffs when adding on functionality to legacy systems).

I probably could have found some hacky solutions to avoid re-factoring my code, but in the long-run that would have made it more spaghetti-like. Instead I made good on my mistakes and paid back the debt with a few hours’ work — and now I’m wiser for it.

This is the picture Google produces for ‘spaghetti code’; looks appetizing.

With those changes made to my program, and with REINFORCEjs up and running, I’m now in a position to have the AI do some heavy lifting.

The early results aren’t promising just yet, so I need to do a bit more legwork to understand what exactly is happening in the code and why I’m not getting the expected results (e.g., even when I put the AI on the doorstep of victory, it isn’t finding its way there as quickly as I’d expect).

The trial and error here should be interesting to say the least.

Read the previous post. Read the next post.

Steven Adler is a former strategy consultant focused across AI, technology, and ethics.

If you want to follow along with Steven’s projects and writings, make sure to follow this Medium account. Learn more on LinkedIn.

--

--

Steven Adler

I work at the intersection of AI, ethics, and business strategy; thoughts are my own. www.linkedin.com/in/sjgadler