Time-Traveling in the Puzzlelorean: The HackMIT 2017 Puzzle Guide Pt. 2

Part 2 of 1, 2

Welcome to part 2 of the HackMIT 2017 puzzle guide! Check out part [1] if you haven’t already!

The DeLorean Codes

(Website, GitHub)

For this puzzle, hackers needed to use their inference skills to figure out a coding scheme from a small number of examples. Specifically, they needed to communicate information through a time machine. This time machine had a very unique error model: instead of each bit having some probability of flipping like usual, each “chunk” of information was sent at some unpredictable delay. In practice, this had the effect of permuting the words of any sentence that went through it (so the story goes).

All puzzlers had to go off of were three codeword -> message pairs and an interface to decode codewords. By slightly manipulating the example codewords and analyzing how that changed the resulting messages, puzzles could deduce key ideas behind the decoding algorithm.

For example, one of the first things people noticed was that you couldn’t just delete random words in a codeword sentence. Attempting to decode such a message usually resulted in an “ERROR READING” message. You could, however, reorder the words in a sentence however you wanted. This seemed to have absolutely no effect on the output.

Taking a step back, this last point implies a partial ordering over the words used in this puzzle. If all the permutations of a codeword sentence decode to the exact same message, then you ought to be able to define a canonical ordering over the words. To figure out the precise shape of this ordering, puzzlers needed to figure out one more thing: how to encode a single letter.

Looking at the provided examples, most of the words in the three codeword sentences are unique, but there are a few repeats. Peculiarly, both the first and third examples contain the word “bob”. What could these two examples have in common that warrants this similarity? Attempting to decode “bob” via the interface yielded a surprising result: the letter “g”! At last, a helpful clue: while it may not have been possible to remove arbitrary words from codeword sentences, you could start small and incrementally build your way up.

By incrementally trying pairs of words from the first codeword, puzzlers could eventually notice that not only did “bob” decode to “g”, but “bob television” would decode to “gr”. With another minute or two of fiddling, puzzlers could uncover the internal ordering used by the puzzle: “bob television missing hey know way still nickles ever isnt letting”. Now all we have to do is figure out how the heck these words are ordered!

With some research — either Googling words or examining the headers (see the X-Script and X-Tokenization-Strategy headers below) — puzzlers could figure out that the Back to the Future script was in some way related to the puzzle.

By downloading our provided script or finding a random script online (they’re all pretty much the same), puzzlers could begin to experiment with words that did not appear in the provided examples. Instead of just encoding “g” with the word “bob”, with some effort, you could learn to encode any letter of the alphabet as a single word. It’s at this point that solvers began to split into two different camps.

One camp, which I call the “puzzlers” camp, continued to deduce more and more about the inner workings of The DeLorean Codes. Their goal was to achieve a deep understanding of how the puzzle worked and to use that knowledge to construct their solution. The other camp, which I call the “brutes”, wrote scripts to essentially brute-force the final answer once they realized it had something to do with the script. Take a look at the graph on the right hand side displaying the number of attempted decodings per second:

250 decodes per second! Yeah. Brutes, am I right? Pay attention to the blue line, representing the number of well-formed codewords. If you’re having trouble seeing it, it’s because it’s zero.

We’ll get back to the brutes later, but now I’d like to talk about the final few steps to uncovering the inner workings of the time machine encoding scheme. Playing around with the script, you’ll soon realize that the first 27 unique words in the script map to the 26 letters of the alphabet and the space character. The next 5 don’t decode properly. Of course, the word “bob” (the codeword that decodes to “g”, the first letter in examples 1 and 3) is in this initial set of 32 words! And the next word in our experimentally-determined canonical ordering, television (see above if you forgot)? It appears in the next batch of 32 words! The third word, “missing”? The batch after that. You get the picture.

Stated succinctly: to encode a letter X at position Y in a string, first convert the character X to an index i from 1–27. Next, Find the i’th word in the Y’th batch of 32 words in the script of Back to the Future. If you do this for all of the letters in your message, putting the resulting words in any order will work! You’ve solved the puzzle, in time proportional to the length of the message you’d like to decode. A number of people solved this puzzle entirely by hand once they realized the scheme! True puzzlers indeed.

Now for the brutes. In fairness to them, once you think you have a way to solve a puzzle, of course you’re going to pursue the heck out of that idea until you think it’s not going to work out. So although brute-forcing the solution feels kind of icky, it gets you to the same place. Another way to solve this puzzle is to submit random words from the script until you decode the first letter of your message properly. Then, append that word to a list and continue with subsequent letters. Eventually, in time proportional to the script length times your message length, you’ll get the right answer.

As a parting note, I’d like to praise the power of unit tests. We wrote tests for our encoder/decoder, because we knew puzzlers would be furious if we messed up our implementation! And of course, we caught some very sneaky edge cases that we never would have anticipated thanks to our testing suite.

Some people loved this puzzle, and others absolutely hated it, but we hope that looking back, you can all appreciate the puzzler’s drive to 1) make hypotheses, 2) test them, and 3) uncover mysterious truths.

Tinbot

(Website, GitHub)

With the fourth challenge, the puzzlers were greeted with a website that looked like Tinder. Think Tinder, but for bots — Tinbot. The site had three primary functions:

  • The puzzlers could upload a picture to the website and then go ahead and find other nearby bots.
  • They could swipe left to dislike and swipe right to send them a like.
  • If the bot liked you back, it’s a match and you’d go on a date with the bot.

A little digging inside the client-side code and looking at all the console logs and ajax requests revealed that every time a puzzler uploads an image, their image is sent over to the server and is classified as one of ten categories. Many puzzlers realized that these classes were from the CIFAR10 dataset.

Further, each bot match depends on the bot’s “preference”. This means that if the server classified your image as a dog and the current bot prefered a dog, it would be a match. At this point, many puzzlers realized that simply getting a match wasn’t enough — they had to match with one particular bot, which was named the “Puzzler Bot”. The Puzzler Bot had an interesting preference: it would always want the profile classification to be “automobile”. This meant that to get Puzzler Bot to like you, you needed to create a profile picture that classified as an automobile.

As a hint, we left a deliberate unresolved FIXME in the markup of all pages:

<!-- FIXME: Make /api/revalo/model/model.json, and model.hdf5 files private. -->

This gave the puzzlers the weights and the structure of the convolutional neural network the server ran to perform the classification. Many puzzlers ran the entire CIFAR10 dataset into the neural network, hoping to get an image that the network classified as an automobile. Others generated thousands of random images with the same motive. Of course, none of these approaches actually worked.

The intended solution was to start with a random image and iteratively optimize for the activation of the automobile classification, executing gradient ascent by updating the image in the direction of a positive gain in activation.

The intended solution can be found in the Tinbot Repository.

The resulting image would be classified as an automobile, and thus get the puzzler bot to match with you and whisper the puzzle answer.

Many puzzlers took a similarly different route to searching for the input image. Some used simulated annealing to search for a global optimum. Others used genetic algorithms, starting with a generation of randomly-generated images and selecting images with the highest activation, breeding and mutating them further randomly until their fitness increased.

One puzzler even took the time out to make their seed an image of the Back to the Future DeLorean, fine tuning their optimizer to result in an image that looked no different than a DeLorean to the human eye, but classified as an automobile.

Implementation

We trained 50 convolutional neural nets on CIFAR10, replacing all automobile images with a single random image per model. We therefore had 50 different models, each overfit on their own random image. This allowed us to serve the model almost uniquely to the puzzlers, such that the probability that two puzzlers shared the same model, and therefore the same solution image, was very low. A simple test case ran the model across the entire CIFAR10 and ImageNet datasets, as well as a million random images to make sure the only solution to the problem was a variant of search.

The app was designed to be completely stateless. The server issued “prediction tokens”, cryptographically-signed pieces of information that validate whether the server ever predicted that class. Similarly, all the other states were recorded on the client’s machine using JWTs, without storing any data on the server.


This puzzle was partly designed to bring puzzlers out of their comfort zone and learn something new about machine learning, neural networks, and using the standard libraries to solve this task. It was also a slight play on the Uber for X and Tinder for X paradigm. Thanks to all the puzzlers who made it this far!

Captcha

(Website, GitHub)

Puzzle 5 homepage

In the fifth and final puzzle, Captcha, puzzlers were instructed to solve 10,000 instances of a personalized 4-character captcha in any way they saw fit. Some sample captchas are below:

ndgp
hwi1
1aqa

Unfortunately, what is simple in explanation is not necessarily simple in execution. There was never an “intended” way to solve this puzzle — it was purposefully left open-ended, allowing the puzzler to tackle the problem in any way they saw fit. Solutions we saw generally fell into three categories (ordered roughly by frequency):

  1. Train a neural network to solve the captcha automatically
  2. Reverse-engineer the captcha generation algorithm
  3. Pay someone to solve all 10,000 manually
  4. Solve all 10,000 manually yourself
  5. Create a custom optical character recognition algorithm (shoutout to kindiana)

We’ll focus on #1 and #2, but our hearts go out to the solvers who ultimately did it by hand — a back-of-the-envelope calculation suggests a solve every 3 seconds means ~8 hours spent mindlessly solving captchas. ¯\_(ツ)_/¯ Whatever gets the job done!

Training a neural network

When trying the computer-based recognition route, solvers quickly realized that off-the-shelf solutions like Tesseract just didn’t fit the bill. Solvers were limited to submitting 15,000 captchas at a time, meaning that they needed a captcha accuracy rate of at least 67%. Many solvers spent hours carefully designing character-segmentation and noise-reduction algorithms, but Tesseract doesn’t provide the needed accuracy with any reasonable amount of cleaning**.

Eventually, some solvers realized that a custom recognition solution was necessary. Perhaps primed by the previous puzzle, many sought to train a neural network to do the job of Tesseract. There were many possible network architectures, but most attacked the problem by training a character-level convolutional neural network to predict each single character of the captcha. A character-level neural network like this needs (0.66)^(1/4) = 90% accuracy to give the 66% the whole captcha needs***. Some successfully reused their character segmentation code to train the model only on the parts of the image that mattered, while others gave the CNN the whole captcha. In any event, neural network solvers learned a lot about training ML models from scratch, on their own.

Some puzzlers successfully used the exact same neural network architecture as the previous puzzle.

** The reason for this is that Tesseract OCR is designed for documents, and uses many underlying models, like a language model, to predict words and characters. Since we’re trying to predict individual characters, a language model is no use.

*** this can be improved in several ways, one of which is by taking into account the probability the model assigns to each letter. If the probability is low, throw the whole captcha out and try a new one.

Reverse-engineering captcha generation

Instead of going the image-recognition route, many tried to deduce the pattern used to generate the captchas. Each captcha has a name, and some thought they could try to find how we go from a captcha name to a full image. Some early digging reveals that the background of the captcha is completely determined by each solver’s GitHub username. Here’s an example of three captchas for the same user:

exm5
2f3z
de3f

The pattern in the background is identical. Ultimately, however, the background of the captcha doesn’t matter, and only the algorithm used to generate the letters matters. Two important hints were given in the puzzle slack:

  1. The letters are a function of the captcha name and your GitHub username and nothing else.
  2. We used a run-of-the-mill hash function to get the randomness from these inputs.

If you think the only way to solve this puzzle is to do a brute force over lots of parameters, then you’d be right! Here’s a list of possible parameters:

  1. The hash function used.
  2. How to combine your captcha name and GitHub username.
  3. How to convert the hash output to a string of letters.

Some ideas for the last one:

  1. Take a bit range and turn it to base36 to get all 4 captchas.
  2. Take a bit range for each letter and turn it to base36.
  3. Take a bit range for each letter and take it mod 36

Fortunately, there ultimately aren’t that many parameters since computers are fast, and a brute-force method takes ~2 seconds to correctly deduce the method used:

  1. Hash = md5(username + captcha name)
  2. For each letter i in the solution, take 2 bytes at position 2*i mod 36, indexing into the string “a-z0–9”

Perhaps due to the open-ended nature of this last puzzle, we saw solvers try solutions that we had never even thought about when writing it. Thank you for wowing us — your ingenuity is what makes writing these puzzles fun.

Conclusion

As a final note, we would like to say thank you to everyone who attempted the puzzle this year! The dedication of solvers is amazing — the mean solve time for the entire puzzle was 127.267 hours (~5 days), with the maximum solve time at 335.72 hours (almost 2 weeks!). Thank you to all 730 people in the puzzle Slack who joined to chat! Some solve statistics are shown below:

Solve times in hours per puzzle (0-indexed)
Histogram of solve time in hours per puzzle (0-indexed)
Users who reached each puzzle (1-indexed)

These puzzles are amazingly fun to write, but only because of the dedication you all have to solving them. To those who finished solving in the top 50, congratulations. To those who didn’t quite make it, we hope you had fun along the way! Don’t forget to apply to HackMIT 2017 by July 31, 11:59pm EST.

Like what you read? Give Jack Serrino a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.