PACTF 2018 Writeup: Skywriting

This is a holdover until I can make a more detailed post on my main site. Until then, here’s the solution to Skywriting, along with why I thought it was a reasonable problem to post.

Some background: Skywriting was the last problem in PACTF 2018. It had no solutions from eligible teams, and in that regard remains the most difficult problem in PACTF 2018 by that measure (or, as far as I know, from any PACTF).

I’m going to do this in two parts. The first part is the basic, spoiler-ific, direct pathway of solving it. I’ll then go over how a team could reasonably think of and execute each step.

1. Solution

The problem gives you a ZIP of 20 WAV files. Each of them only differ in the last bit of the samples: the equivalent of LSB steganography for audio files. The program I used for this steganography is here, so that will also decode it. (It would not be hard to write this yourself once you realize what’s going on.)

Decoding the file gusty-garden-6.wav and taking the initial valid UTF-8 gives the following text:

This is a substitution cipher with the added wrinkle that the different substitutions are different lengths. (This actually means there’s some ambiguity, but bx doesn't really occur in the English language, so you should assume it's f.)

After figuring out which groups of letters only appear together, replacing those with single letters, and then using some frequency analysis, it’s possible to decode the message as follows:

The “tiny shortened clues” are tinyURL links. Going to the two links (https://tinyurl.com/flaglink1 and https://tinyurl.com/flaglink2) links to two Google Docs pages. One of them is a congratulatory message, and the other gives the flag: a_cloud_is_just_someone_elses_computer.

2 Discussion

OK, so that’s a bit direct. How might you find the hidden data? How might you solve the cipher?

Keep in mind that, unfortunately, I can’t give a full picture of this: I never had to solve the problem blind. I can, however, give a sense of that path I imagined leading to a solution, and perhaps some hindsight on why that path was never taken.

The problem can be split into three parts: the steganography, the cipher, and the tinyURL links. I’ll discuss each in turn.

2.1 Part 1: 20 WAV Files

From my conversations in the IRC chatroom for PACTF, I think most people got stuck at this stage. (However, I will note that at least one team got past this point.)

The thing that most people seemed to struggle with was the idea that only one file mattered. I will defend this problem design choice: there are enough problems about breaking steganography where you already know it exists. I wanted to do a problem that required you to find it first.

Many teams soon realized where in the files the changes were: the last bit of every sample. Most teams at this point tried XORing data from the different files together or combining them in some other way. Some teams had the idea of using statistical randomness tests, but I don’t think they used the right ones or interpreted it correctly.

To fully explicate what I mean, we’ll use a tool called ent which tests for randomness with various tests. We can examine each file using a simple Bash script, and we get the following output:

As you can see by looking through it, gusty-garden-6 is a clear outlier on the chi-squared test, which is consistent with data that is ordered but not necessarily weighted. This effect would be stronger if you limited it to the part of the data that actually changes, but even at the preliminary survey stage, knowing nothing else about the problem, it's clear that every file is not the same.

Of course, if you came up with the idea to convert to UTF-8 (which again, is literally what comes up if you google “wav steganography”), as long as you glanced over the right part of the problem you’d instantly notice the anomalous text and would know to begin from there. Either way, I felt this step was reasonable, and some people did get here.

2.2 Part 2: Substitution Cipher

This is a part I have far less data on, because I haven’t talked with anyone who got to this section in the problem. Therefore, I’ll limit my assumptions about how teams might have solved it and just summarize why I think it’s solvable.

Once you disambiguate some homoglyphs, it’s really clear that certain digraphs and trigraphs appear with complete uniformity: many characters only appear in one character output, and others are pretty unambiguous. By progressively identifying substitutions and replacing them with any English letter, you can transform it to a basic letter-for-letter substitution cipher with relative surety. (Additionally, small mistakes are unlikely to jeopardize the problem.)

Once you get to this point, you can use frequency analysis to get a head start. However, the easiest way to finish it off once you get started is to recognize the poem I quote verbatim in the middle of the ciphertext. Any part of this that is distinctive is enough to Google the rest. Once you get that, you’re done: you get enough letters to solve all of the important text.

2.3 Part 3: URLs and Flag

This part was originally going to be a bunch of random text, with the solution being to interpret them as Google Drive IDs (given the “don’t be evil” and the theme of clouds, I think this was reasonable). The problem was that the characters GDrive uses aren’t always the ones that are in English text a lot, and so you’d have a tough time finding the actual links. I used tinyURL to fix this issue and then added the “tiny shortened” part to give you a little kick in the right direction. As far as I know, no one got here, but I don’t think this part would have been that challenging if they did.

3 Conclusion

To be honest, I was surprised at the difficulty teams experienced in solving this problem. However, I hope any teams that were stymied by it don’t think it was necessarily unfair, just difficult. I think each step in this solution could be catalyzed by logical clues, and nothing required pure trial-and-error. I think the difficulty delta from the rest of PACTF might have exacerbated the difficulties teams faced: perhaps people were expecting a problem closer to the round 1 hard problems. Additionally, it’s a fairly non-standard CTF problem. I’d argue that works to the problem’s advantage, and from what I’ve seen people who haven’t done too many CTFs seemed to like it more, but that’s anecdotal. If you disagree, or want to share anything else, feel free to contact me! Hope you enjoyed the CTF as a whole! Also, if you like problems like this, make sure to check out PACTF 2019 in many months’ time.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store