Using a PIE binary as a Shared Library — HCSC-2020 CTF Writeup

István Tóth
Oct 23 · 8 min read

The challenge “Baseline test” was a great reverse engineering challenge with hard difficulty at the Hungarian Cyber Security Challenge 2020 CTF Qualifiers hosted by the National Cyber-Security Center of Hungary on the platform Avatao Next.

Image for post
Image for post

The challenge

Points: 300
Difficulty: hard

Answer some simple questions.

Instructions

The baseline test is an examination designed to measure any emotional deviance. In addition to the original test, this one has a second part to challenge rationality. Answer every question to fetch the flag.

Accessing the challenge was provided by SSH (cmdline was included in the challenge description), and in the container there was a SUID binary called “test” what was actually the challenge. Running the binary test starts asking questions:

Image for post
Image for post

The test binary should be downloaded to local machine for reverse engineering analysis (for example with scp).

If you give wrong answers or give the right answer but not fast enough, it goes back to the beginning. Only the fast right answers work. After passing all of the questions, the SUID binary reads and outputs the file /srv/flag.txt which is readable only by root. (For explanation see below.)

The first round of questions

Image for post
Image for post

Without going deeper in the details, the function FUN_001010f0() is the main() function, FUN_00101450() manages the questions, the first parameter is a pointer to the function which provides the questions and the second one is the timeout for one answer in seconds.

The question provider functions provide a question and an answer using the following arguments:

  1. The number of the question (0, 1, 2, …, 7)
  2. Pointer to the answer string
  3. Length of the answer string

The function FUN_001013f0() provides the first round of questions with a short timeout of 1 second. These are basic, trivially reversible static Q-A pairs: the answer for the first question is “Cells.” and the answer for the further 7 questions is “Interlinked.”.

NOTE: for getting this much cheaper, without any static reversing methods, dynamic analysis may help. Just attach ltrace to the running test process:

Image for post
Image for post

And now the right answer can be fetched from the strcmp calls after entering the answer, or it can be fetched from the snprintf calls even before answering.

Ok, this was really easy, but the interesting (and harder) part is the following one (where static reversing is unavoidable).

Interacting with the binary

My favorite method interacting with such binaries is using the CTF toolkit pwntools (of course ;) ).

For running the binary (locally) and controlling the input/output works super easily and comfortable this way (Python snippet answers for the first question):

If you want to interact with remote binary through SSH, just change the c = process("./test") line to this:

Now with the proper answers, we could reach the 2nd part (and switch to interactive mode with c.interactive()):

Image for post
Image for post

The 2nd part

Dynamic analysis with ltrace:

Image for post
Image for post

This means the following Q-A pairs (and it can be continued):

+=========================================+================+
| Question (hash) | Answer (plain) |
+=========================================+================+
| 2f24579ea2305a2d6d15c666d90e761f | 411-722-287 |
+-----------------------------------------+----------------+
| 4892eea2909a8faca148ca2fb1312c3e | 823-168-034 |
+-----------------------------------------+----------------+

Let’s go back to static analysis with Ghidra.

Image for post
Image for post

The answers are random integers (sourced from /dev/urandom) separated into three 3-digits block. The questions are calculated by FUN_00101680() with these random answers as input.

What does FUN_00101680() do? It is very similar to the MD5 hash calculation, but slightly different (the shift amounts seem to be correct, the sines constants seem to be ok, but the init values seem to be different and maybe there are some other differences as well).

How can we answer the questions? If we could calculate the hash value for a string (like the “test” binary does), we could brute force the plaintext for the target hash by trying all of the XXX-XXX-XXX patterned variations (where X is an arbitrary digit). This means brute forcing 1.000.000.000 variations.

This is not impossible (in the case of MD5-like operations). Benchmarking MD5 calculations (with hashcat) on my Intel(R) Core(TM) i5–7300U CPU:

Image for post
Image for post

This means <2.24 secs for 1.000.000.000 MD5 hashes (we are in the 5 secs timeframe ;) ). But this method requires some modification to the optimized MD5 code, because the one in our binary is not standard MD5.

There is another option which requires some preparation but does not require strict reversing and implementing custom optimized algorithms.

Use the function in the binary itself

In theory, this is possible, because our binary (fortunately) is a PIE (position independent executable), which behaves very similar to shared libraries.

Thanks to the awesome work of Romain Thomas, security engineer of Quarkslab, LIEF project (Library to Instrument Executable Formats) helps doing the above in practice.

LIEF project also has a detailed tutorial about transforming an ELF executable into a library. In a nutshell, all we need to do is just add the required function to the list of exported functions (because it is empty if the binary is not compiled as a library).

Let’s see how to do it in practice. First, we need the latest unreleased version of LIEF, because we need to remove the DF_1_PIE flag from the transformed binary/library in order to bypass the dlopen() restriction in recent libc (≥2.29) versions, and DF_1_PIE support is only present in the unreleased master branch (so a little bit time-consuming (~60 mins on my average pc) compile is necessary; if it is unacceptable, use an older libc and install a release binary version using simple pip install):

git clone https://github.com/lief-project/LIEF
python3 -m venv venv
. ./venv/bin/activate
pip install .

Confirming that the binary is a PIE executable:

Image for post
Image for post

Now let’s add the required function (FUN_00101680) to the export list. In Python:

Now libtest.so is ready, and FUN_00101680() can be called (as md5_custom()). PoC code for calling md5_custom():

custom_md5.c

Note, that there is some negation operands and byte swapping before outputting the hash, this comes from the reversing process, it was there just to cause some trouble for the CTF players :) ).

Compiling to executable:

gcc custom_md5.c -o custom_md5 -ldl

And now testing by the above Q-A pairs (look at the beginning of The 2nd part section):

Image for post
Image for post

Great, the results are the same hashes as above. So this simply seems to work.

But unfortunately this is not an optimized version, brute forcing 1.000.000.000 hashes with this one will almost surely exceeds the 5 secs barrier.

Fortunately there is a time-memory trade-off solution to overcome this: using precomputed rainbow tables.

Rainbow table

rainbow.c

Compiling is the same:

gcc rainbow.c -o rainbow -ldl

The size of the generated table is very large (and not optimized), it takes about 45 GB space, so it should be saved on disk (to a file called rainbow.txt, computation and dumping took <20 mins on my pc):

./rainbow > rainbow.txt

The resulted rainbow.txt with 1.000.000.000 lines is:

34a14e65171d97079f631b9dce0d307e:000-000-000
279872821e4cd80dc5c79e6a247a637f:000-000-001
7797d3ac947f16c7311ca103309b5609:000-000-002
126d68f1dccf202eabf3eb77c4dafb1e:000-000-003
a9194f2b865cb1c155e9b310cf7d651d:000-000-004
...
...
...
feaf5e97a52937e6de30e5a40eee1cad:999-999-996
9ffe1659209465ebd7ed08a8ca773858:999-999-997
061e3c8b0ca20f035291c0a48f2b21f4:999-999-998
f81ccfa495e9856bdf2d74145884d86b:999-999-999

Searching a hash in this text file is really slow, so let’s optimize it by some structured indexing. Using the file based SQLite database is a simple but effective solution for this task.

Create a DB and a table, then import the created dump as a csv (takes about 20 mins again, and about +55 GB disk space is needed):

sqlite3 rainbow.db
sqlite> CREATE TABLE hashes(hash TEXT, text TEXT);
sqlite> .separator :
sqlite> .import rainbow.txt hashes

And at last, create an index for highly effective lookup on hashes (takes about 20 mins again, and about +45 GB disk space is needed; however, the rainbow.txt file is deletable now, if you are getting out of disk space):

sqlite> CREATE INDEX hash ON hashes(hash);

Now the lookup on hashes should work extremely fast:

Image for post
Image for post

Also, don’t forget to quit (avoiding file syncing problems on db file write):

sqlite> .quit

Now everything is ready for action.

Putting it all together

Here it is (also available in my ctfs writeup github repo, solve.py):

Here it is in action:

Image for post
Image for post
Full solution for the challenge

Look at that hash solving answer times, faster than 5 secs by orders of magnitude :)

Image for post
Image for post
Hash lookup is super-fast

In summary, I enjoyed this challenge because of the elegant and robust method of PIE binary to Shared Library conversion, and calling the hash calculation function directly (without reimplementing and/or reversing it strictly), but there should be also other solutions as well, e.g. identifying and implementing an optimized version of the hash function, and brute-forcing without the need of precomputed tables (5 secs could be enough for that).

If you have other solutions and/or any questions, comments are welcome (here or at my twitter).

InfoSec Write-ups

A collection of write-ups from the best hackers in the…

Sign up for Infosec Writeups

By InfoSec Write-ups

Newsletter from Infosec Writeups Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

István Tóth

Written by

IT Security Expert, Penetration Testing, Red Teaming | OSCP | CRT(E|O) | @RingZer0_CTF 1st (for 2yrs), RCEH | HackTheBox Top10 | RPISEC MBE | Flare-On completer

InfoSec Write-ups

A collection of write-ups from the best hackers in the world on topics ranging from bug bounties and CTFs to vulnhub machines, hardware challenges and real life encounters. In a nutshell, we are the largest InfoSec publication on Medium. Maintained by Hackrew

István Tóth

Written by

IT Security Expert, Penetration Testing, Red Teaming | OSCP | CRT(E|O) | @RingZer0_CTF 1st (for 2yrs), RCEH | HackTheBox Top10 | RPISEC MBE | Flare-On completer

InfoSec Write-ups

A collection of write-ups from the best hackers in the world on topics ranging from bug bounties and CTFs to vulnhub machines, hardware challenges and real life encounters. In a nutshell, we are the largest InfoSec publication on Medium. Maintained by Hackrew