The Terminus of our Terminal Strategy

Published in

Terminal Player Strategies

6 min readFeb 21, 2019

Preamble

Having spent the better portion of my life playing video games, the idea that I could get compensated rather than grounded for my efforts seemed compelling enough to relapse. After hearing of Terminal through a promotional advertisement for a live event held at UWaterloo, the group known as Team 6 consisting of Ruslan, Tilman and myself officially assembled to take up the challenge.

Below is our retrospective on the final version of our strategy, along with some closing thoughts.

Technical Stack & Lessons Learned

When starting off the competition, we were ecstatic about the prospect of using policy-gradient based reinforcement learning methods in practice. We spent about a week learning about the trade offs between Q-learning and policy based methods, and settled on a proximal policy optimization model to choose between different actions. We built this using Python3 as well as some external libraries (Tensorflow, Numpy). All of us having come from a software engineering background, there was a tremendous amount of learning on our behalves, and having built a working model gave us immense satisfaction.

However, the more we played the game we realized that the action space was way too vast to train our network in a reasonable amount of time, so we tried compressing the action space down to a few heuristic-based strategies from which our model can choose. The realization that we had was that our model is only good insofar as our heuristics are, and thus we chose to pivot to an entirely heuristic based strategy without a model, which was implemented in Python3 as well.

Strategy Development

Strategy Evolution & Overcoming Obstacles

Our ideation process consisted of the three of us discussing strategies to beat the default bosses, implementing a basic strategy, and watching replays to see where it broke. From there we made incremental iterative strategies to either fix breaking changes, or gain an edge against better opponents.

After playing the default bosses, we uploaded our algorithm to the global competition to observe how it performed against other top algorithms; this gave us access to a swathe of new strategies from which we can theorize how to beat a generic opponent. One of the interesting things we noticed while playing was that some algorithms could “intuitively” send a large amount of pings (high damage, low health units) and calculate that it would break through the enemy’s defence; we reckoned that they were running a simulation of some sorts, and since there was no built-in simulator we would have to make one ourselves.

To overcome this, we modified algo_strategy.py to contain a method “simulate”, which simulates different combinations of units and spawn locations, and calculates frame-by-frame the expected damage given and taken, resulting in a return value containing the total core damage dealt as well as the number of units that successfully made it to their base. From this, we apply a value function which returns 3.5 * number_units_made_to_their_base + num_cores_damage_dealt. We take the max value of all of the simulations, and choose that to be our attacking strategy for the turn.

Final Algorithm Logic

Originally, our algorithm was designed to play a war of attrition against the enemy, constructing a 2-layered maze and utilizing the long range of the EMP to gradually whittle away against the enemy defences. This algorithm performed pretty well, however it was not very adaptive and constantly failed on opponents rushing specific corners, and didn’t know how to capitalize on chinks in our opponents armour such as an open hole through which we can attack.

Aside from the pivot from reinforcement learning based methods to a heuristic strategy, the major change that we implemented in our strategy is to change it from a static one to one that was adaptive. Originally we had an algorithm that spawns as many EMPs as it can at a specific location, as well as conforming to a specific build order every time. We decided we would make the shift to an adaptive algorithm to be able to deal with the issue of lack of opportunity capitalization, as well as mitigation against rush based defences.

Reactive defences will build more destructors at attacked locations

The way we addressed these issues are by simulating all of our possible attacks, and choosing the best one from our simulations (implementation described in the previous part). Additionally, we mitigated against the rush strategies that we were able to see previously by implementing reactive defence instead of just filling out our defences in the same order every time. This ensures that the most important parts of our defence get built first, and in the case that we are getting rushed (which we check by noticing patterns in the game state) we build extra defences.

In pseudocode, our algorithm can be best described as follows:

if getting rushed:
    mitigate against specific tactics depending on the patternreactive_defense()
if we have cores left over:
    keep building out our templatescores, strategies = simulate_attack_strategies() choose_best_strategy(scores, strategies)

Engineering Design

Although it was quite disappointing that we scrapped our previous reinforcement learning code, the heavy componentization of our previous work motivated us to think in an object-oriented way going forward.

Any time one of us wanted to augment the current strategy, we would clone it and label it with the new version number, as well as creating a new git branch to push our changes to. To keep our code as modularized as possible, we created different folders and classes to bundle related pieces of logic together.

For instance, everything related to our defensive strategy was kept in the Defences.py inside a Defences class. With this, we are able to abstract away any computations related to Defences allowing us to divide-and-conquer the work that has to be done among us, since we could have one person working on Defences, and the other two writing code assuming that that functionality would be provided. Similarly, we bundled together our strategies into a Strategies module which we were able to import into different iterations of our bot. Similar to the last point, this allowed us to parallelize our work streams, and made for a much easier time debugging since all the desired functionality with high cohesion was bundled together into their respective modules, reducing duplication of effort and increasing code clarity.

Post-Mortem Thoughts

In the first iteration of our algorithm, we noticed that although we performed well against other “slow-playing” strategies, we were losing to a lot of rush-based strategies. After a few iterations, we were consistently able to beat out rush based strategies, but found a new kryptonite: EMP spammers.

In this case, we don’t want to rebuild the attacked wall, since it will inevitably fall without mitigating an attack

We noticed that if our opponent is spawning EMPs, they cut through our defences on the side which they spawn, and in our current strategy we try to rebuild defences in a static build-order priority. To account for this, we are going to modify our reactive defence to see if an EMP spawned in the corner, and if we took a good amount of damage on our front lines. If this is the case, and there is no path that exists from the enemy to shoot pings into our undefended corner, then we will forego the re-building of our defence so that we don’t give them free cores of damage.

As for what we would do differently next time, we would start even earlier! We started a week ahead or so, but didn’t really do much work together, mainly theory-crafting over what we would do once we actually meet up and work. We believe that the most productive time we spent together was when we were sitting down and coding, as it allowed us the chance to bounce testable ideas off of each other, and allowed us to iterate upon our work as a group.

Closing Thoughts

Although for the competition we ported our strategy over to that of a heuristic one, we are planning on participating in the global competition with a Deep Q-Network based strategy.

Huge thanks to Correlation One and Citadel for putting on such a well organized and entertaining event; you’ll be sure to see us at the next competition!