The third puzzle this year featured a checkout page and had three smaller parts that needed to be solved. This is a walk-through of all parts and touches on some techniques of reverse engineering, basic HTTP, and REST APIs. While not required, some prior knowledge is useful (throughout the text there are a lot of links to supplementary information if you are interested). If you are interested there’s a list of the other puzzles from 2018.
In this post a development version of the puzzle is used with much easier requirements. The actual puzzle had a requirement of 2000 promo codes entered (half of which could be entered without solving a CAPTCHA). Also, the steps described aren’t the only way to go about solving the challenge, but are instead the way we were thinking of going about it.
Solving the puzzle
In the shopping cart there’s an item called “Puzzle unlock” but sadly there is a big “Service unavailable” banner over the entire page. Since this is the web and we have control over the browser, we can inspect what’s up with the banner. You can do this on any web page by right-clicking and selecting “inspect element” or a similar option.
Huh, there are two
divs that are overlaying the entire page, both of which start with
puzzle2018_, this is probably important. You can either just delete the elements from the page (right-click > delete or press delete on your keyboard) or make life a bit easier and use something like uBlock to hide both of them. If you delete the node, you’ll need to redo that on every page refresh.
This will will just fill in some junk starting with S and submit it every 10ms. Let it sit for a while and at some point the page refreshes and we get our next challenge:
Whelp, looks like it didn’t like that. I guess we were a bit suspicious, but a CAPTCHA? What a pain, those are unbeatable, right? Fun fact, did you know this is basically Google using everyone as free labor for their self-driving car training? Luckily in this case your answers are checked against already known classifications and your brain cycles are simply wasted instead of training some neural network…
Back to the challenge at hand: we are fortunate that this is not a reCAPTCHA but just a haCKTXA. That must mean that it is a bit easier or that there is some way around it!
After clicking the box we have a challenge and it asks us to either select all cats or dogs from a set of nine images. Easy to do manually, but entering a promo code and filling out a CAPTCHA a thousand times is kind of tedious. Also we’re lazy so there has to be a better way to do this.
If we inspect one of the square images in the CAPTCHA, we can see that each of the images is loaded from the server from a URL like
/images/captcha/7c6a0378fbb279d1a78655a8079f3a6b.jpg which we can open in a new tab. It loads as expected. Looking at the file path we can see that all of images are in some folder called
captcha inside of another one called
images. It would be pretty funny if all the images would just be dumped there, right? Let’s remove the filename to just go to the folder.
It turns out all of the CAPTCHA images are in this folder! So we can just get a dump of all possible CAPTCHA images. Now there’s three ways to go about sorting them into the cat and dog categories:
Option 1: Annotate manually or using AI
We could grab an off-the-shelf classifier, download all the images, and select the ones that match the expected classification. But that’s hard and AI is lame anyways. Let’s look for something else.
Option 2: Find a pattern
Maybe the filenames mean something? They kind of look like hashes (a one-way function that allows you to map some arbitrary input to a fixed set of outputs), but we all know we can’t undo hashing, right? Well, you kind of can by looking up values in something called a rainbow table. After a quick Google search for “reverse hash” we stumble upon a website that allows us to “crack password hashes” (this is what they are really intended to be used for and why salting passwords is a thing). Let’s try one of the filenames:
And it looks like we have a match! Trying a couple more it seems like each filename is just
md5(category + padWithZeros(n)) + ".jpg" We can now just enumerate all possible filenames for each category.
Option 3: Dig some more
If you go up another folder you’ll be greeted with the contents of the /images folder:
Two more folders: cats and dogs. Not suspicious at all, right? Let’s explore the cats folder.
There are a bunch more images here and some of them correspond to the images in /images/captcha. We now have the categorization and just need to match the images with each other. We can’t use filenames since they differ, but we could either take a hash of the file contents (SHA1 or MD5 should do) or compare something like the file size.
Submitting back the results
You could automate the act of clicking elements on the page using a headless browser (something like puppeteer) or automate your mouse and keyboard movements (maybe using PyAutoGUI). Another way is to emulate the actual requests. Let’s explore that.
I’ll assume we have a list of all of the possible filenames for cats and dogs. Now we just need to fill out CAPTCHA and submit those promo codes back to the server. Let’s open the developer console, switch to the network tab, refresh the page, and fill out one CAPTCHA. If you want to try this on some page it helps to check “preserve logs” to prevent them from clearing when the page refreshes. A lot of interesting requests are being exchanged. It might seem like a lot at first, but upon closer inspection a lot of the requests (the ones with the random filenames) are the images. The really interesting ones are highlighted below.
Once we click the “I’m not a robot”-box, the page makes a request for a challenge. That’s the first outlined GET request to
/captcha/get with no special parameters. The response looks something like this:
The script on the pages uses the returned JSON to display the CAPTCHA and after selecting images another request is sent to validate the response. That’s the POST request to
/captcha/verify with two parameters: one called
challenge that is set to
jWghexp85X3IcFLbHs3hUJbImET5mZNT (same as the ID we got above) and an array called
selected that contains the images we selected (four in our case).
If successful we get something like the following back from the server:
It’s the same challenge ID along with some other verification string. Up until now we clicked the box, filled out the CAPTCHA and pressed verify. After filling out the promo code box and pressing apply we get another request: POST
/applypromo This actually submits the form with three parameters: challenge, code, and verification. The challenge and verification parameters are the same ones we got back after succeeding on the CAPTCHA. The code is what we entered into the box.
So what we need to do is emulate the requests that the legitimate page made and automate the entire thing. Our program should do something like the following:
- request a challenge (GET
- look at the category and check which of the possible images match (check the
labelfield and select all IDs that match)
- send back a response with our selection (POST
/captcha/verifywith the challenge ID and array of matched IDs)
- save the returned verification token and generate a random promo code (same pattern as before, just use your letter and random numbers)
- send the promo code, challenge ID, and verification token (POST
- repeat another 999 times
A couple of things to note: each confirmation token is unique to a challenge and neither verification tokens nor challenge IDs can be reused. Each request should also have your session cookie (called
shopping-session) attached so that the server can match the request to your user. You are basically impersonating yourself so you don’t need to go through the browser.
Things that went wrong
Of course not everything went perfectly while running the puzzle in production. Two big issues to note for the next time puzzles roll around:
- The application was running out of disk space more frequently than desired. It turns out it isn’t a good idea to log every request that goes through the application to disk. Truncating regularly/disabling logs fixed this. Sorry!
- The application ran in development mode and exposed stack traces whenever an error occurred. This was distracting because it seemed like a hint for some part of the puzzle when this wasn’t the case.
There is one last puzzle after this one: ready to read how to become a VIP?