Shrinking Executables With papaw

Dima Krasner
2 min readApr 8, 2020

--

First of all, what is a packer? It’s a tool that receives an executable (a program) and outputs an executable that’s functionally equivalent, but smaller or harder to research (for example, by competitors). The packer does so by combining a tiny program which extracts and runs the input executable ( the “stub”), with its smaller (compressed) or obfuscated form. When the end-user runs the output executable, the stub extracts the embedded input executable and runs it.

papaw is written in C (and C only), so it works on any CPU architecture supported by Linux. Some other packers contain parts written in assembly, hand-tuned for small size, to reduce the size of the stub. Therefore, the papaw stub is big (about 10–30K) compared to those of some other packers.

To compensate for the stub’s large size, papaw uses LZMA, a general-purpose compression algorithm that yields a good compression ratio. Therefore, in many real-world scenarios, papaw outperforms many other packers. When the input executable is large (for example, a Go or Electron application), the high compression ratio compensates for the larger stub. For example, the VSCodium 1.41.1 executable for x86_64 drops from 108MB to 34MB, while UPX 3.94 (with compression level 9) outputs a larger, 42MB executable.

Other scenarios where papaw shines, are binaries for RISC CPUs with fixed-width instructions, like MIPS, or binaries for CPUs without floating-point capabilities, which tend to be large and highly-compressible due to frequent repetition of identical code blocks. Also, LZMA decompression is fast and does not consume much RAM when done in a single pass, as papaw does. Moreover, executables packed by papaw can be deleted once they start running. For these reasons, papaw is highly suitable for long-running applications targeting embedded devices that have very little (or zero) storage space.

In addition to the advantages of speed and size, papaw implements several anti-debugging tricks: for example, it asks the operating system not to move the extracted executable from RAM to swap under low memory conditions, to prevent researchers from extracting the executable simply by exhausting all available memory, then reading the executable using raw disk access.

Finally, papaw is permissively-licensed and provided under the terms of the MIT license, which allows its use with proprietary software, even if the papaw source code has been modified. To make papaw easier to use, its releases include pre-built stub binaries for the most popular CPU architectures, statically-linked against the permissively-licensed musl C runtime library.

In conclusion, although it is a young and unknown project, papaw can be an extremely useful tool, when developing Linux software for environments where storage is expensive or scarce.

See papaw on GitHub! If you’re using Go, you might be interested in go-papaw.

Originally published on LinkedIn.

--

--