Automating Victory: Beating browser games with accessible Python (Part 1)

Published in

henngeblog

9 min readApr 10, 2023

A procrastinator’s guide to project management

A gif of a minesweeper-like game being automatically played. The moves are very quick and proceed from the top of the game to the bottom.

Here’s how I automated beating one of my favorite games using Python, how it feels to be an unofficial tool-assisted speedrunning champion and some outlines for how you can join me at the top.

There goes my free time, and maybe yours!

Unless you were following releases for Flash games in 2010, android games in 2015, or HTML5 games in 2016, you probably don’t know the indie title Mamono Sweeper by Hojamaka Games.

A still from an in-progress game of Mamono Sweeper. There is a grid of numbers around multicolored pixel art monsters. — I’ve spent hours staring at this screen

Actually, unless you are my coworkers who saw my presentation during MTS 101 you probably don’t know this game, but I was hooked on it. The mobile game thankfully maxes out its in-game timer at 9999 seconds, so I don’t really know how much time I’ve spent playing… But usually, I’d see the timer get that high while trying to beat the one-hit-means-game-over mode, shortly followed by the game-over screen.

The most challenging part of that mode isn’t the math or the strategies, but user error. I’ve had a lot of games go for an hour or more only to be cut short by a careless mistake or worse, guessing wrong on a 50/50 shot. That hurts. And like all good programmers, I decided to soothe my pain with cold, unfeeling technology. Let’s look at what I was up against!

Meet the monsters:

The game is a cross between Minesweeper and an RPG where the goal is to reveal all tiles on the board. But instead of mines, you have to watch out for monsters.

Pixel-art monsters labeled as levels one through five. — Like these guys!

Unlike Minesweeper, where clicking a mine is an instant loss, you can click any unrevealed monster at or below your level to beat it. You’ll reveal the tile, gain some experience, and eventually level up so that you can click on higher-level monsters. If you click on a monster that’s too high-level, you’ll lose hit points and eventually lose the game. If you click on every monster, you’ll beat the game.

Like Minesweeper, the number on an empty tile gives you a clue about its neighbors. Unlike Minesweeper, the number is the sum of the levels of the monsters around the tile. Here, the middle number is three because it has one level 1 neighbor and one level 2 neighbor.

A three by three grid of numbers. A level one monster is in the top left corner, a level two monster is in the bottom right. The middle number is 3.

In regular Minesweeper, there are 56 possible ways to arrange the mines around a “3” tile. But here, it could be made up of three level 1 monster, a level 1 and a level 2 monster, or a single level 3 monster. The math there works out to be any of 120 possibilities. That’s way more ambiguity to deal with!

A visual elaborating on the above description of arrangements. — 8C3 + 2 * 8C2 + 8C1 for my fellow math nerds.

On the other hand, because you can clear out monsters as you level up, the ambiguity on the board shrinks over time. As such, the difficulty comes down to board size and monster density. Difficulties range from Easy to Huge x Extreme (Dev’s description: やらないほうがいい / You’re better off not playing…). That mode is hard, but I had an idea on how I could get through it with much less work. All it would take is a bunch of work…

Minimum Viable Pain-in-the-neck:

My initial idea for this script was to do what I do only better. I’d run a loop of “look → think → click” until I either won or lost. This rough draft idea had a lot going for it:

Look at the board: Pixel art is easy to parse and compress
Build a virtual model: The game is a grid of numbers. Computers love grids of numbers
Identify safe moves: This step is math. Computers love math
Make those moves: This isn’t a shooter, just click on stationary squares
Repeat: Crucially, there’s almost no information that needs to be recorded across rounds. Looking at the board always gives all the information on the state of the game

The biggest challenge was that the developer designed the game to be played by humans and not computers. This brings us to the first takeaway that helped me actually finish this project: rush to the minimum viable product, or MVP for short.

An overall MVP for the script might be something that automatically clicks random spots on the board but that represents successes in every single bullet point above. That’s a lot of work for something that won’t even win the game! Skipping straight to that is painful and discouraging in the sort of way that kills personal projects.

The top row describes building a pyramid from the bottom up. A frowning face is only happy at the very end when it resembles a pyramid. The bottom row describes building a pyramid by starting with a small pyramid and adding layers on the side to make it larger. The face is happy throughout. — Better to start small and improve than to hold out for perfection

Each step, however, also has an MVP that builds off of previous successes. For the first “look” step, I leveraged the MSS and OpenCV libraries to take screenshots and identify a reference point to find the game window. How long does that take to run? Who cares! It saves an image that I can work with.

Next up: building a virtual model. Terrible step name, way too broad of a scope. I’ll just figure out what level I am for now. Optical Character Recognition is a well-studied field with very approachable libraries and machine learning models, and I’m not going to use any of that. This is just pixel art! I’ll check if the ~40 pixels match a reference image. Figuring out what we’ll need to parse we’ll need besides our level is a job for Future Jon.

Bad news: it’s immediately time to be Future Jon. We have to figure out the grid’s location to click on it, but because we’ve focused on building a working solution up to this point we can start with a script that’s already running. Even if the solution here is brittle, we can fix it down the line if needed (as Future Future Jon). From there, I leveraged the PyAutoGUI library to click the grid at any coordinates I wanted.

Elaboration on the above, the “LV:” reference point is pointed out, as are the brittle other references at the corner of the play grid.

Finally, for repetition we can just loop through the steps we’ve already taken and click randomly each time. Unsurprisingly, this strategy fails a lot. But it fails in ways that we can gradually refine as a working product rather than trying to pre-refine an ambiguous goal into perfect code.

A few revisions later and I was ready to start directly translating human strategies into code.

Moving Very Promptly to a Moderately Viable Product

Remember earlier when I said to rush for an MVP? Well, that was useful for the grunt work, but for the actual thinking parts of the project, my second takeaway was the opposite: optimize the easy stuff to understand the tricky stuff.

Follow your passions to figure out when to push through to an MVP and when to search for something elegant. I wanted to make a script that could beat the game. The image parsing and automation work it took was just a means to an end.

Meanwhile, I really cared about implementing the strategies I used to beat the game. The simplest strategy you could use to play is “if a revealed tile’s number is less than or equal to your level, all of its unrevealed neighbors can be clicked.”

A red box is around the three unrevealed squares next to a revealed empty tile with the number “1”. — The squares in the red box cannot contain a monster over level 1

Translated to pseudocode, that might look something like:

for row in grid:
  for column in row:
    if grid[row, column] <= level:
      for neighbor of (row, column):
        if neighbor is unclicked:
          safe_to_click.add(neighbor)

Well it would work, but it’s indentation hell and Python’s terrible at nested for-loops. Luckily, I know just enough NumPy and scikit-learn to get me into trouble. Thanks to some research and experimenting, I was able to do the exact same thing without any for-loops, netting a 20x speedup!

Anyway that doesn’t translate to anything noticeable in performance. The script spends 80% of its time clicking, 15% of its time taking screenshots, and only 5% of its time executing these strategies. Even if I could eliminate the time the script spends calculating its next move, it can only get 5% faster. Oops. Wasted effort, should have taken my own advice and stuck to the MVP, right?

The “Not so fast!” graphic from the Ace Attorney game series — Graphic from the Ace Attorney series

Figuring out the more sophisticated version of the trivial strategy was worth the extra time investment because the more advanced strategies build off of this simple one. The next strategy I implemented does something similar as the first step in a multi-step operation and runs recursively. Spending the effort to where it didn’t matter taught me lessons I applied in much more important contexts.

Ultimately, when it’s something you care more about, it’s easy to spend the effort learning and getting better. And really, isn’t that why we’re here?

Hold on, wait, why are we here?

Glory, pure and simple

Most games list records for how quickly they’ve been beaten on speedrun.com and Mamono Sweeper is no exception. I wanted those top spots. I watched those videos of the most skilled players beating this game in minutes but only saw the lackluster failings of pitiful human bodies. I could do better. Rather, my script could do better. Here’s a link to the video of the record-holder of easy mode beating it in a mere 19 seconds:

And here’s my script beating easy mode.

Caution: flashing lights and blinding speed

Records: Difficulty / Best human time (MM:SS) / Best computer time (MM:SS)

Easy / 0:19 / 0:01
Normal / 1:44 / 0:06
Huge / 6:23 / 0:14
Extreme / 3:43 / 0:08
Blind / 2:06 / 0:04
Huge x Extreme / 12:54 / 0:21
Huge x Blind / 9:59 / 0:13

One of the biggest takeaways for me was that when shortcuts and the joy of learning aren’t enough, having an end goal will keep your project going. Even though there’s shockingly no leaderboard on speedrun.com for tool-assisted Mamono Sweeper records, getting these times as low as possible kept me opening my code editor. Beating the “blind” difficulties, where clicking a single monster means game over, required some really clever code, but I only wrote it because I defined success as beating all seven difficulties.

All in all, I took every record down to mere seconds, took my abilities to their limits, and took a look at some of the cool things that Python can do. I’m proud of the outcome. You might even wonder whether I’ve made this entire blog post to brag about my good script.

Of course not.

This script isn’t good, it’s the best.

The best that exists, anyway, but it’s not perfect. It’s got a couple lingering bugs, there’s another second or two that could be squeezed out of the records, and its win/loss ratio could be improved, but I’m happy for now.

If you’ve read through this and gotten fired up to automate the hard-to-automate in your life, stay tuned for the part 2 follow-up to this blog post. I’ll provide hackable code samples, go in-depth on the nerdier details of translating my strategies to code, and talk about some of the unsettling security implications of this.

A 25-second animated gif of one of the hardest modes being solved.

Until then, thank you for reading. Order my picture book at afraidalot.com.