During the last NorthSec CTF in Montreal, there was a fun little challenge related to “computer vision”. I put that in quotes, because we ended up writing a quick and dirty solution that had very little to do with computer vision. Here’s the description of the challenge…
When you followed the link, you ended up on a page with a random image in the middle, along with two choices you could select to categorize the image: Animal or Thing. The page was blurry, due to a CSS filter on the whole document. This was in line to the neuro-chip implant theme of the CTF and was not a hindrance in solving the challenge.
A bit of recon…
First, we had to understand how the challenge worked, so we manually selected an answer and submitted the form, to see what it did…
Notice a the top left of the page, a message with a score was added, telling us how many attempts we had to get right to get our hand on the flag. Obviously, we didn’t want to do more than a few manually, we’re lazy hackers after all.
Another thing we wanted to know in this recon phase, was what happened if we submitted a wrong answer. The result? They sent us back to square one. That meant, we had to guess 15 000 images right, without making a single mistake… yikes!
Next recon step was to look at the source code of the page, to see how the images were fetched.
As you can see in the source code (apart from the table layout!), is that a PHP script was used to generate the image. Nothing changed in the source from one request to the next, the files were generated dynamically by the backend.
Our follow-up question was, how did they keep track of which image is shown to compare against our answer? Our guess was that it was stored in the session, and each time we submitted the form, it knew the last image that was shown to us. So if we made a request to /catchat.php in another browser tab, we were at risk of changing the answer we needed to POST.
As far as we could tell, the only identifying information we had was our session cookie, so we’d need to keep that value for later.
The last thing we needed before we could get started, was to peek at how the request was sent to the server. We’d then have enough information to replicate the communication between the browser and the server in a Python script of our own.
Not bad, just a plain and simple form POST request. With this knowledge, we had enough to get scripting. Just before we closed the browser to open our favorite code editor, we copied the POST request as a cURL command (in the right-click menu), and used this nifty online tool to convert the request to a Python script using the requests library. That would be the base of our submission script later on.
The vision script
Now, we had everything we needed to start coding. One thing I didn’t mention, is that while testing the form manually, we noticed that there was a pretty limited set of images, like a couple dozen at the most. Definetly not enough to train a machine learning algorithm, so we figured that’s not how they wanted us to solve the challenge
We instead decided to brute-force the dataset and help the script categorize the images when it was not able to do so automatically. The plan was to make a script that would fetch the image, fingerprint it with MD5, prompt us for the right answer and then store that in a database. It didn’t take long before we hacked together a script that would do just that.
The script downloads an image, generates a hash with the content, prints the image to the terminal, prompts the user for an answer and then stores that answer in a makeshift database. The plan was to slowly categorize the images by hand until our database had an answer for every image.
As you answer, it fills up a flat file with image hashes and the associated answer, just like this:
The trouble was, our script never exited the infinite loop. It just kept adding more and more hashes to the database, even though we started to see the same images come up repeatedly in the terminal, something just wasn’t right…
Why do we get the same image twice?!
To understand why, we had to download a few images locally and compare them with a diff tool…
As you can see in the screenshot, the server was inserting a random colored pixel somewhere in the image, to make sure we couldn’t hash the pictured as easily. We could be shown what looked like an image we have seen before, but the checksum would be completely different. It was back to the drawing board….
We brainstormed a few solutions to that hiccup. We basically wanted a checksum algorithm we could use to fingerprint files quickly, but that didn’t have one of the properties of most hashing algorithm: the fact that any small change in the input value results in a completely different output value.
One solution we explored was converting the PNG image to a bitmap, hashing each of the 166 lines individually and comparing the matrices with a ~2% threshold, because two of the rows could end up being different.
As we painfully updated the script, we stumbled on a package that seemed promising. It basically does image hashing, but way better than what we intended to do. The hashing strategy that was particularly interesting to us was the average hash, that could be used to fingerprint images and compare them in a way that ignores minute differences (a.k.a. high frequencies).
In pictures, high frequencies give you detail, while low frequencies give you structure. Think of pixel level details and colors versus overall shapes and luminosity.
The trick behind the algorithm is simply to scale down the image enough that you lose those high frequencies, e.g. by scaling it down to an 8x8 square and converting it to grayscale. It can then compute a Hamming distance between two images to tell you how similar they are. A distance of 5 or less, it’s probably the same image, save a few minor artefacts. A distance of 10 or more, the images are probably very different.
With that new weapon in our arsenal, we updated our script!
The distance comparison happens in the is_already_in_db() function:
We knew our new strategy worked when the script started classifying images all by itself, never asking for user input 👌
Here’s what our database looked like after letting the script run for a while to make sure it prompted us for all the possible images. All the hashes, and all the associated answers.
With that database, we could write a final script to submit our answers.
Our “solver” script
We started our solver script with the Python code that was generated from the POST request. This was to ensure we had all the headers we needed and the PHP session cookie.
We started by loading our database in memory, then looped through the images from 1 to 14995, thinking we might want to do the last few manually, just so we wouldn’t miss the flag.
The rest of our script looked pretty similar to the vision script: download the image, hash it, compare it to the hashes we have in our database and when a match is found, submit the answer for that image.
Because we’re a paranoid bunch, we decided to leave the code that would prompt us for an answer, in case a random image was slipped in the lot by an evil challenge maker. We also decided to leave a bunch of debug statements in, just in case we had to figure out a problem.
We let the script run for a while, gingerly submitted the last few answers manually and finally got our flag! 🍾
It was a fun coding challenge, and while our code was arguably a lot more complicated than other write-ups I saw, we learned a ton about the different ways you can compare an image.
It was really satisfying to see the script classifying images all by itself when we finally got our image hashing right.