Extracting Secret Test Cases From Google Foobar Challenge

5 min readSep 9, 2018

I’m not a computer science person, but I had briefly heard of the Google foobar challenge in passing. Recently a friend of mine triggered the invite by searching for, presumably, esoteric programming questions through Google search. I have never seen the invite, since most of my Google searches are the programming equivalent of “how do I breathe through my nose if my mouth is underwater .” After passing the questions in level 2 you get a one-time link and my friend was nice enough to send me his link. I have taken one official C/C++ programming course 8 years ago, and other than that I am self-taught. So I went in treating it as a fun logic problem.

Level 1 was some straightforward problem that was more a tutorial than an actual question. Level 2 was two questions, one where I was asked to sort a list of strings, by some hierarchical condition. The second one was a simple number theory one-liner about counting positions in a triangular shape.

Ok, so some interesting stuff, but only if you make it interesting for yourself. Let’s see what level 3 looks like.

I don’t feel bad, since you can just google these anyway.

So it’s a graph problem where you have to navigate a maze, with the added complication that you can go through a wall if necessary. “Fine,” I think, “I’ll just run a Dijkstra’s algorithm with a flag for how many walls I’ve passed through, easy enough.” It’s not the most efficient solution, but the 20x20 box limit makes me not care so much. I’ve never done any graphing/optimal path problems, but boy can I implement an algorithm that was written on Wikipedia with the best of ’em. So I’m sitting there running my test cases, which are all passing on my machine. Here’s the output of a plotting function that I wrote for visualization purposes.

It all looks good. Let’s try to verify it.

So there is obviously something I am not considering in my tests, but what? It may not surprise you to realize that I went to bed late and woke up bothered by this. I’m no stranger to troubleshooting weird stuff in the lab, so I started fiddling around to get some hints out of the failures. I have a single binary up/down return value for each test case. Being sleep-addled, I was willing to return static numbers for a little and see when I got one of the tests (4 or 5) to register as passed. Starting at the minimum for a 20x20 matrix (just a guess) and increasing the guess I only had to do it for a little, before I tripped test case 4 at path length 47. The same trick gets me the shape of the matrix at 16x20, and finally a binary tree search tells me there are 117 walls ~36% filling. So whatever is wrong with my algorithm is happening during a larger matrix with a high filling rate. With this new info in hand I make a new test case (copied from the internet, yes I’m lazy), trigger my issue, and solve it during my lunch break.

I live by the motto: “If you can’t make it better, you can at least make it worse.”

So what was happening is that the algorithm couldn’t backtrack more than one step when encountering walls. The simple cases didn’t trigger this behavior since it never needed to backtrack twice. The base Dijkstra’s algorithm doesn’t remember how to get to the mandatory wall in a way that doesn’t require knocking down a wall first. So all I had to do was to add a second “shortest-path” memory to the vertex object that keep a memory of the path to the node that involved passing through fewer walls. Pretty straightforward change to my solution and…

So we’re all done right? Well, when I was messing around with binary responses from the algorithm, I made a few typos in the web editor which threw specific exceptions. Like so:

It looks like I can raise any exception I want from inside the code, but I don’t know which test case raised it. But I can encode data obtained during the test case into an exception and use that exception on the output to get the result. It turns out there are 48 exceptions, which means I can get 5 bits of information out of a single test run that raises a coded exception, a factor of 5 improvement (custom exceptions or chaining unfortunately show as the base exception class). The height and width are now each a single test call (20< 32), so it’s trivial to get the shape of test cases 3 and 5, (20, 15) and (16, 16). I could extract the actual map data, but there is no scripting allowed in foobar so it would take ~50-60 manual file edits, file runs, and copy pastes for each matrix. Oh well, I am kind of curious, but its 3 AM and… GUI AUTOMATION TIME!

So after learning pyautogui at 3 AM to scrape coded exceptions from the foobar terminal, processing the outputs I can present the hidden test cases and solutions…

So now that I know the test cases and answers, I can submit an O(n) “hashing solution” to the problem for the known 5 test case subset, after checking the elements match the input, which will take O(n) operations. The technique will work to extract any data from a test set in foobar, and could be improved by using line numbers to increase the information yield, and adding lossless compression like Asymmetric numeral systems (both suggestions from a friend). Google could clearly fix the information leak, but it would become more inconvenient for people to submit so I don’t know. I hope I don’t ruin it for everybody!

Extracting Secret Test Cases From Google Foobar Challenge

Written by Matthew Ebert