Going backwards to go forwards
This post is part of ChurrPurr.ai, a challenge to design an online strategy game and the AI to master it. Play the latest version of the game here.
Over the weekend, I dug in earnestly on my AI’s implementation. The good news? I’m pretty sure I’m getting somewhere. The less-good news? It required taking a few steps back in the process. Let me explain …
After reading through various online tutorials, parsing example AI libraries on Github (a big online repository of code), and successfully installing both Google’s TensorFlow and OpenAI’s gym on my computer (two major components of mainstream AI programming today), I finally (re)arrived at a critical part of my solution: REINFORCEjs, a JavaScript library created by Tesla/Stanford/OpenAI’s Andrej Karpathy.
I’m fairly sure I came across REINFORCEjs earlier in my search (and for some reason overlooked it), 1) because it’s two of the top three searches for “reinforcement learning javascript” on Google, and 2) because Andrej is somewhat of a celebrity in the AI field.
Whoops. But glad to have found my way back. (Prior to re-finding REINFORCEjs, I spent a significant amount of time reading through Arthur Juliani’s excellent tutorials on TensorFlow and various forms of reinforcement learning — thanks, Arthur!)
Going forwards
A library is a compilation of different functions, variables, etc., that can be borrowed from for other programs. For instance, programmers can often borrow from a Math library to implement basic math functions rather than defining the functions themselves.
Having REINFORCEjs in-hand will allow me to focus on the particulars of my challenge and aspects like knowing which actions are valid, the rewards associated with them, etc., rather than having to implement the basic mathematical fundamentals.
In some sense, implementing the underlying math would have been useful to truly ensure I understand the AI’s process — but I’m fairly line-by-line in the code as is, trying to get it to play nicely with my existing code. So I don’t mind too much not having to translate arrays to matrices to columns, and so on.
Going backwards
While REINFORCEjs is in many ways a huge boon to my project, it also interacts with inputs/outputs a bit differently than I’d anticipated in designing my game, and accordingly I’ve had to reconfigure a few elements:
- Whereas descriptions of states were strings before (e.g., ‘1001201200 … ‘), they now need to be arrays (e.g., [1,0,0,1,2,0,1,2,0,0]).
- Action sets are also arrays of possible individual actions, which makes sense, but didn’t sit well with how I’d (probably unwisely) designed my actions initially — a duo of [actionToTake, actionOrigin (if relevant)].
The first change was relatively simple to implement, but also makes my code less clean than I’d like, and it’s probably a lever for future efficiency gains. Straightforward enough though.
The second change was more involved, and unfortunately it required backtracking on some design choices I’d made earlier on (namely, that I wanted the computer to be able to skip over ‘activation’ and go directly to ‘sliding’ — since it doesn’t make sense to activate a cell you won’t then immediately slide).
Accordingly, I had to rework my state descriptions to have a new move status (‘a’ for activation, joining ‘m’ ‘r’ and ‘s’) and add an additional value at string’s end to represent the activated square, if any. This also required reworking the functions by which states are generated; determining what actions are valid at a state; and the next state produced by the action.
It also changed up the form of actions and consequently how they were passed to different functions, with some knock-on effects. All in all, this was a lot of refactoring, but the systems now work again.
Going forwards again
This was also an insightful lesson to be more mindful about design and how the idea of technical debt evolves in companies (i.e., short-term long-term tradeoffs when adding on functionality to legacy systems).
I probably could have found some hacky solutions to avoid re-factoring my code, but in the long-run that would have made it more spaghetti-like. Instead I made good on my mistakes and paid back the debt with a few hours’ work — and now I’m wiser for it.
With those changes made to my program, and with REINFORCEjs up and running, I’m now in a position to have the AI do some heavy lifting.
The early results aren’t promising just yet, so I need to do a bit more legwork to understand what exactly is happening in the code and why I’m not getting the expected results (e.g., even when I put the AI on the doorstep of victory, it isn’t finding its way there as quickly as I’d expect).
The trial and error here should be interesting to say the least.
Read the previous post. Read the next post.